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Series Editor’s Introduction 


This volume introduces social scientists to a new and rapidly develop- 
ing methodology for the analysis of longitudinal data on the occurrence 
of events. It is remarkable just how many important phenomena can be 
thought of as events—divorces, retirements, crimes, strikes, bankrupt- 
cies, and wars are only a few examples. Increasingly, researchers are 
collecting longitudinal data on both events and their possible causes. 

Unfortunately, ordinary multiple regression is ill suited to the analy- 
sis of event history data. As Allison explains, “censoring” and time- 
varying explanatory variables can produce severe bias and loss of 
information. Although methods for dealing with these problems effec- 
tively have recently become available, most of the literature is difficult 
for social scientists to use. Many of these procedures were developed by 
biostatisticians and engineers who have little interest in or knowledge of 
the substantive concerns of social scientists. Even methods developed by 
sociologists and economists have failed to reach a wide audience 
because the presentations have been highly mathematical. 

Event History Analysis is the first monograph-length treatment of 
these methods at an elementary level. It makes a major contribution in 
unifying and demystifying the very scattered and technical literature on 
this topic. The underlying theme is that the application and interpreta- 
tion of these methods is not all that different from ordinary multiple 
regression analysis. Indeed, anyone with a good practical knowledge of 
multiple regression should have little difficulty reading this volume. 

The emphasis, then, is on regression models. Chapter I discusses the 
problems with ordinary multiple regression. In Chapter 2, Allison 
shows how discrete-time event history data can be analyzed with logit 
regression models. Chapter 3 presents parametric regression models, 
including the exponential, Weibull, and Gompertz models. Chapter 4 
explains the extremely popular proportional hazards model, along with 
its associated estimation technique, partial likelihood. Later chapters 
show how these basic models can be applied to more complicated data 
structures. 


Given its length, this work is surprisingly comprehensive. It covers 
nearly all the models that a social scientist is likely to find useful. For 
each model, there is a discussion of assumptions, estimation methods, 
and problems that may arise. The major models are amply illustrated 
with detailed analyses of data on arrests of released prisoners and job 
changes by biochemists. Allison also gives good advice on choosing 
among alternative models, and on evaluating the fit of a chosen model. 

There is a wealth of practical information on the use of these 
methods, especially on computational considerations. Throughout the 
text, relevant computer programs are mentioned and evaluated. 
Appendix C describes and compares the major features of six different 
programs for doing event history analysis. Appendix B gives annotated 
program listings for the empirical examples presented in the main text. 

Above all, this volume is exceptionally clear and easy to read. It tells 
the reader Virtually all he or she needs to know to use event history 
methods effectively, and it does so with a minimum of mathematical 


notation. It should have a major impact in making these methods both 
attractive and accessible to social scientists. 


Richard G. Niemi 
Series Co-Editor 
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Longitudinal Event Data 
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1. INTRODUCTION 


In virtually every area of the social sciences, there is great interest in 
events and their causes. Criminologists study crimes, arrests, convic- 
tions, and incarcerations. Medical sociologists are concerned with hos- 
pitalizations, visits to a physician, and psychotic episodes. In the study 
of work and careers, much attention is given to job changes, promo- 
tions, layoffs, and retirements. Political scientists are interested in riots, 
revolutions, and peaceful changes of government. Demographers focus 
on births, deaths, marriages, divorces, and migration. 

In each of these examples, an event consists of some qualitative 
change that occurs at a specific point in time. One would not ordinarily 
use the term “event” to describe a gradual change in some quantitative 
variable. The change must consist of a relatively sharp disjunction 
between what precedes and what follows. 

Because events are defined in terms of change over time, it is increas- 
ingly recognized that the best way to study events and their causes is to 
collect event history data. In its simplest form, an event history is a 
longitudinal record of when events happened to a sample of individuals 
or collectivities. For example, a survey might ask respondents to give the 
dates of their marriages, if any. If the aim is to study the causes of events, 
the event history should also include data on possible explanatory 
variables. Some of these variables, such as race, may be constant over 
time, while others, such as income, may vary. 

Although event histories are ideal for studying the causes of events, 
they typically possess two features —censoring and time-varying 
explanatory variables—that create major problems for standard statis- 
tical procedures such as multiple regression. In fact, the attempt to apply 
standard methods can lead to severe bias or loss of information. In the 
last 15 years, however, several innovative approaches have been devel- 


9 
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oped to accommodate these two peculiarities of event history data. 
Sociologists will be most familiar with the work of Tuma and her 
colleagues (Tuma, 1976; Tuma and Hannan, 1978; Tuma, Hannan, and 
Groenveld, 1979), but similar methods have arisen independently in 
such diverse disciplines as biostatistics and engineering. Hence, there is 
no single method of event history analysis but rather a collection of 
related methods that sometimes compete but more often complement 
one another. 

This monograph will survey these methods with an eye to those 
approaches which are most useful for the kinds of data and questions 
that are typical in the social sciences. In particular, the focus will be on 
regression-like methods in which the occurrence of events is causally 
dependent on one or more explanatory variables. Although much atten- 
tion will be given to the Statistical models that form the basis of event 
history analysis, consideration will also be given to such practical con- 
cerns as data management, cost, and the availability of computer pro- 
grams. Before turning to these methods, let us first examine the difficul- 
ties that arise when more conventional procedures are applied. 


Problems in the Analysis of Event Histories 


To appreciate the limitations 0 
event history data, it is hel 


was arrested at any time dur 
dummy variable was the depe 


mizing the dependent variable is 
nformation, 


! d It is arbitrary because there was 
nothing special about the 12-month dividing line except that the study 
ended at that point. Using the same data, one Might just è 
those arrested before or after thesi t 


because it ignores the variation ei 
might suspect, for example, th 


as well compare 
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release had a higher propensity toward criminal activity than someone 
arrested 1] months later. 

To avoid these difficulties, it is tempting to use the length of time from 
release to first arrest as the dependent variable in a multiple regression. 
But this strategy poses new problems. First, the value of the dependent 
variable is unknown or “censored” for persons who were not arrested at 
all during the one-year period. If the number of censored cases were 
small, it might be acceptable simply to exclude them. But 74 percent of 
the cases were censored in this sample, and it has been shown that 
exclusion of censored cases can produce large biases (Sørensen, 1977; 
Tuma and Hannan, 1978). An alternative solution might be to assign the 
maximum length of time observed—in this case one year—as the value 
of the dependent variable for the censored cases. But this obviously 
underestimates the true value and, again, substantial bias may result. 

Even if none of the observations were censored, one would still face 
another problem: how to include explanatory variables that change in 
value over the observation period. In this study, for example, individ- 
uals were interviewed monthly during the follow-up year to obtain 
information on changes in income, marital status, employment status, 
and the like. Although awkward, it might seem reasonable to include 12 
different income measures in the multiple regression, one for each 
month of follow-up. This might make sense for the person who is not 
arrested until the twelfth month, but it is surely inappropriate for the 
person arrested during the first month after release; his income after the 
first month should be irrelevant to the analysis. Indeed, the person may 
have been incarcerated during the remainder of the follow-up period so 
that income becomes a consequence rather than a cause of recidivism. In 
short, there is simply no satisfactory way of incorporating time-varying 
explanatory variables in a multiple regression predicting time of an 
event. 

These two problems—censoring and time-varying explanatory 
variables—are quite typical of event history data. Censoring is the more 
common difficulty because often the explanatory variables are mea- 
sured only once. Nevertheless, it is increasingly common to find longi- 
tudinal data sets with measurements of many variables at regular inter- 
vals. For most kinds of events, such data are essential to get accurate 
estimates of the effects of variables that change over time. 


An Overview of Event History Methods 


Event history data are by no means unique to the social sciences, and 
many of the most sophisticated approaches have been developed in 
other disciplines. This is a source of great confusion for the novice 
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because similar and sometimes identical ideas are often expressed in 
quite different ways, typically in substantive contexts that are unfamil- 
iar to social scientists. It is helpful, then, to begin with a brief historical 
and comparative survey of this rapidly growing body of methods. 

From demography we get the earliest, best-known, and still widely 
used method for analyzing event history data—the life table. It is not a 
method I shall discuss in this monograph, however, because it is amply 
treated in standard demography texts (e.g., Pollard et al., 1981) and 
because it does not involve regression models with explanatory varia- 
bles. It should be noted, however, that one of the most influential 
regression methods—Cox’s (1972) partial likelihood method—was 
inspired by the fundamental ideas behind the life table. 

While the life table has been in use since the eighteenth century, it was 
not until the late 1950s and early 1960s that more modern methods for 
event history analysis were actively pursued. In the biomedical sciences, 
the substantive problem which called for such methods was the analysis 
of survival data and, indeed, much of the literature on event history 
methods goes under the names of survival analysis or lifetime analysis. 
For example, an experiment may be performed in which laboratory 
animals are exposed to different doses of some substance thought to be 
toxic or palliative. The experimenter then observes how long the ani- 
mals survive under each of the treatment regimens. Thus the “event” is 
the death of the animal. Censoring occurs because the experiment is 
usually terminated before all the animals die. Biostatisticians have 
Produced a prodigious amount of literature on the most effective ways 
to analyze such data (for a bibliography, see Kalbfleisch and Prentice, 
1980). These method 


thods have become standard practice in the analysis of 
data on the survival of cancer patients. 


Meanwhile engineers were facing similar problems in analyzing data 


pee breakdown of machines and electronic components. The 
a cone S they developed —which go by the name of “reliability” analysis 
allure time” analysis—are quite similar in spirit but slightly differ- 


ent in orientation from those of the biostatisticians (Nelson 1982). In 
recent years, however, these two tra i 


E e heg ditions have effectively merged into 

Social scientists were largel 
because the substantive conc 
Nevertheless, a vigorous tra 
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gap between the sociological approach and what had already been done 
in biostatistics and engineering. It has taken some time, however, for 
social scientists to appreciate fully the close connections among these 
three intellectual streams. 

In the remainder of this chapter, I will delineate some of the major 
dimensions distinguishing different approaches to the analysis of event 
history data. In some cases, these dimensions effectively differentiate 
methods developed in sociology, biostatistics, and engineering. Other 
dimensions cut across the three disciplines. These dimensions serve as 
the organizing basis for the rest of this monograph. 


Distributional versus regression methods. Much of the early work on 
event history analysis can be described as the study of the distribution of 
the time until an event or the time between events. This is the main task 
of life table analysis, for example. Similarly, in applications of Markov 
processes to social science phenomena, a principal focus has been on the 
distribution of individuals across different states. More recently, all 
three disciplinary traditions have focused on regression models in which 
the occurrence of an event depends on a linear function of explanatory 
variables. As already noted, we will deal almost exclusively with regres- 
sion models here. 


Repeated versus nonrepeated events. Because the events of greatest 
interest to biologists are deaths, it is not surprising that biostatistical 
work has emphasized methods for single, nonrepeatable events. The 
same has been true of methods for failure time of industrial components. 
Social scientists, on the other hand, have emphasized the study of events 
like job changes and marriages which can occur many times over the 
lifetime of an individual. It might seem natural, then, for this mono- 
graph to focus on repeatable events. On the other hand, models for 
repeatable events tend to be more complicated and also raise a number 
of difficult statistical questions. Moreover, mastery of the methods for 
single events is essential for a full understanding of the more complex 
models. Accordingly, we shall spend a substantial amount of time on the 
simpler case. 


Single versus multiple kinds of events. In many cases, it is expedient 
to treat all the events in an analysis exactly the same. Thus, a study of job 
terminations may not distinguish one such termination from another. A 
life table may treat all deaths alike. In other cases, however, it may be 
desirable to distinguish different kinds of events. In the study of job 
terminations, it may be crucial to separate voluntary from involuntary 
terminations. In a study of the effectiveness of a cancer treatment, it is 
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obviously important to distinguish deaths due to cancer and deaths 
from other causes. To accommodate different kinds of events, biostatis- 
ticians have developed methods for “competing risks” and demo- 
graphers have developed multiple decrement life tables. The generaliza- 
tions of Markov models developed by Tuma et al. (1979) also allow for 
multiple kinds of events. Again, however, the introduction of multiple 
kinds of events leads to complications that are best postponed until 
methods for single kinds of events are well understood. 


Parametric versus nonparametric methods. Biostatisticians have 
tended to favor nonparametric methods which make few if any assump- 
tions about the distribution of event times. Engineers and social scien- 
lists, on the other hand, have gravitated toward models which assume 
that the time until an event or the times between events come from very 
specific distributional families, the most common being the exponential, 
Weibull, and Gompertz distributions. A major bridge between these two 
approaches is the proportional hazards model of Cox (1972), which can 
be described as semiparametric or partially parametric. It is parametric 
insofar as it specifies a regression model with a specific functional form, 
itis nonparametric insofar as it does not specify the exact form of the 
distribution of event times. In this sense, it is roughly analogous to linear 
models that do not specify any distributional form for the error term. 


f Discrete versus continuous time. Methods that assume that the time 
of e $ is i 
event occurrence is measured exactly are known as “continuous- 


time” m S actice, time i in di i 
nethods. In practice, time is always measured in discrete units, 


howe © p ei i i it i 
r ever small. When these discrete units are very small, it is usually 
acceptable to treat time as if 


it were measured on a continuous scale. On 
the other hand, when the 


ad OE O tme units are large—months, years, or 
no an Oke aOR to use discrete-time methods (also 
have pied nalsiataden tte stink ee While continuous-time methods 
there is also a sizable PEE S hsp Spopkesig of allthree disciplines, 
especially in biostatistics Br Work devoted to discrete-time methods, 
Mantel and Hanken — NOUN, 1975; Prentice and Gloeckler, 1978; 
Because discrete-t; y, 8; Holford, 1980; Laird and Olivier, 1981). 
implement, they ane oo ae paruculasly easy to understand and 
event history analysis. s a useful introduction to the basic principles of 


2.A DISCRETE-TIME METHOD 


This chapter introduces di i 
lis } S discrete-time methods fi J S 
of a single kind. While this is among t ae a 


he simplest situations, it involves 


TABLE | 
Distribution of Year of Employer Change, 200 Biochemists 
Number 
Changing Number Estimated 
Year Employers at Risk Hazard Rate 
1 11 200 .055 
2 25 189 an 
3 10 164 -061 
4 13 154 .084 
5 12 141 -085 
>5 129 
Total 200 848 


many of the fundamental ideas that are central to more complex forms 
of data. At the same time, the method to be described is eminently 
practical and can be applied in a great many situations. It can also be 
generalized to allow for repeated events of different kinds (Allison, 
1982). 


A Discrete-Time Example 


Let us begin with an empirical example. The sample consists of 200 
male biochemists who received their doctorates in the late 1950s and 
early 1960s, and who at some point in their careers were assistant 
professors in graduate university departments. For a detailed descrip- 
tion of the sample, see Long, Allison, and McGinnis (1979). They were 
observed for a maximum of five years, beginning with the first year of 
their first positions as assistant professors. The event of interest is the 
first change of employers to occur after entry into the initial position. 
Thus, even though we are dealing with what is, in principle, a repeatable 
event, we define it to be nonrepeatable by restricting our attention to the 
first employer change. This is an appropriate strategy if one suspects 
that the process of leaving the first job differs from that of later jobs. 

These events are recorded in discrete time since we know only the year 
in which the employer change occurred, not the exact month and day. In 
theory it would be desirable to distinguish voluntary and involuntary 
changes, but that information is unavailable. Hence. we are dealing with 
events of a single kind. Table | shows the number of biochemists who 
changed employers in each of the five years. Of the 200 cases, 129 did not 
change employers during the observation period and are therefore 
considered to be censored. 
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Our goal is to estimate a “regression” model in which the probability 
of an employer change in a one-year period depends on five explanatory 
variables. Two of these variables describe the initial employing institu- 
tion and are assumed to be constant over time: a measure of the prestige 
of the employing department (Roose and Andersen, 1970) and a mea- 
sure of federal funds allocated to the institution for biomedical research. 
Three variables describing individual biochemists were measured annu- 
ally: cumulative number of published articles, number of citations made 
by other scientists to each individual's life work, and academic rank 
coded | for associate professor and 0 for assistant professor. (Although 
all the biochemists began the observation period as assistant professors, 


some were later promoted.) Thus, we have both constant and time- 
varying explanatory variables. 


The Discrete-Time Hazard Rate 


We now proceed to the development of the model. A central concept 
in event history analysis is the risk set, which is the set of individuals who 
are at risk of event occurrence at each point in time. For the sample of 
biochemists, all 200 are at risk of changing employers during the first 
year and thus the entire sample constitutes the risk set in that year. Only 
11 of them actually did change employers in that year, and these 11 are 
no longer at risk during the second year. (They may be a risk of a second 
employer change but we are only considering the first such change.) 
Hence, at the end of each year the risk set is diminished by the number 
who experienced events in that year. In Table | , for example, we see that 


the number in the risk set declines from 200 in year | to 141 in year 5. 
The second key concep 


t is the hazard rate, sometimes referred to as 
simply the hazard or the rate. In discrete time, the hazard rate is the 
probability that an event will occur at 
individual, given that the individual is at risk at that time. In the present 
example, the hazard is the Probability of making a first job change 
within a particular year fi 


ae ber of events by the er of 
individuals at risk. F. s by the numb 


or example, in the second year, 25 biochemists 
changed employers out of 189 who were in the risk set. The estimated 


hazard is then 25/189 = .132, Estimates for the other years are shown in 


17 


the last column of Table |. There does not appear to be any tendency for 
the hazard of an employer change to increase or decrease with time on 
the job. Note also that, because the risk set steadily diminishes, it is 
possible for the hazard rate to increase even when the number who 
change employers declines. The estimated hazard rate in year 3, for 
example, is greater than the hazard rate in year | even though more 
persons changed employers in year 1. 


A Logit Regression Model 


The next step is to specify how the hazard rate depends on explana- 
tory variables. We shall denote the hazard by P(t), the probability that 
an individual has an event at time t, given that the individual is still at 
risk of an event at time t. For simplicity, let us suppose that we have just 
two explanatory variables: xı which is constant over time, and x2(t), 
which has a different value at each time t. For the biochemistry example, 
xı might be prestige of employing department and x2(t) might be cumu- 
lative number of publications in year t. 

As a first approximation, we could write P(t) as a linear function of 
the explanatory variables: 


P(t) = a + bixı + box2(t) ina] 


fort=1,...,5. A problem with this specification is that P(t), because it is 
a probability, cannot be greater than one or less than zero, while the 
right-hand side of the equation can be any real number. Such a model 
can yield impossible predictions that create difficulties in both computa- 
tion and interpretation. This problem can be avoided by taking the logit 
transformation of P(t): 


log(P(t)/(1 — P(t)) = a + bix: + bex2(t) [2] 


As P(t) varies between 0 and 1, the left-hand side of this equation varies 
between minus and plus infinity. There are other transformations that 
have this property, but the logit is the most familiar and the most 
convenient computationally. The coefficients bı and bz give the change 
in the logit (log-odds) for each one-unit increase in x; and x, 
respectively. 

The model is still somewhat restrictive because it implies that the only 
changes that occur in the hazard over time are those which result directly 
from changes in x2, the time-varying explanatory variable. In most 
cases, there are reasons to suspect that the hazard changes autono- 
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mously with time. With job changes, for example, one see pital 
long-term decline in the hazard simply because pea s Eo 
more invested in a job and, hence, the costs of moving increase. © n 
other hand, untenured academic jobs might show an increase in = 
hazard after about six years when many individuals are denie 
on. NOS 
ae discrete-time model, one can allow for any variation in the 


hazard by letting the intercept a be different at each point in discrete 
time. Thus we can write 


log(P(t)/(1 ~ P(t) = a(t) + bixi + bex2(t) [3] 


EATS 
where a(t) refers to five different constants, one for each ofthe TERA 
that are observed. As we shall see, these constants are estimated by 
including a set of dummy variables in the specified model. 


Estimating the Model 


The next problem is to estimate the parameters by, b2, and the five 
values of a(t). As with the models we shall consider, estimation is best 
done by maximum likelihood or some closely 
principle of maximum likelihood is to choose 
those values which maximize the probability o 
fact, been observed. To accomplish this one 
probability of the observed data as a function 
cients. Then one needs a computational metho 


tion. Both of these steps are somewhat diffic 
neither is crucial to a good w 


model. For further details, th 
(1982). Happily, estimation T 


related procedure. The 
as coefficient estimates 
f observing what has, in 
must first express the 
of the unknown coeffi- 
d to maximize this func- 
ult mathematically, and 
orking knowledge of how to estimate the 
€ interested reader should consult Allison 
educes to something that is now familiar to 
omous dependent variables. 


a 
eac 


be at risk, a separate observational record is 
created. In our biochemistry example where individuals are persons and 
time is measured in years, it iş natural to refer to these observations as 


; n year 5—contribute the maximum of 5 
person-years. For the 200 biochemists, 


é there were a total of 848 person- 
years. From Table | it can be seen th 


Á at this total is just the sum of the 
number at risk in each of the five years. 
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For each person-year, the dependent variable is coded | if a person 
changed employers in that year, otherwise it is coded 0. The explanatory 
variables are assigned the values they took on in each person-year.' The 
final step is to pool the 848 person years into a single sample, and then 
estimate logit models for a dichotomous dependent variable using the 
method of maximum likelihood. Programs for maximum likelihood 
logit analysis are now widely available, for example, in the SAS (SAS 
Institute, 1983), BMDP (Dixon, 1981), SPSSX (SPSS Inc., 1984), or 
GLIM (Baker and Nelder, 1978) statistical packages. 

Notice how the two problems of censoring and time-varying explana- 
tory variables are solved by this procedure. Individuals whose time to 
the first job change is censored contribute exactly what is known about 
them, namely that they did not change jobs in any of the five years in 
which they were observed. Time-varying explanatory variables are eas- 
ily included because each year at risk is treated as a distinct observation. 


Estimates for the Biochemistry Example 


Let us see what happens when this procedure is applied to the 
biochemistry data. Table 2 reports estimates for Model 1 which does not 
allow the hazard rate to vary autonomously with time. The coefficient 
estimates are like unstandardized regression coefficients in that they 
depend on the metric of each independent variable. For our purposes, it 
is instructive to focus on the t-statistics for the null hypothesis that each 
coefficient is zero. (The column labeled OLS t will be discussed later.) 
These are metric-free and give some indication of the relative impor- 
tance of the variables. 

Three of the variables have a significant impact on the hazard rate for 
changing employers. Specifically, biochemists with many citations are 
more likely to change employers, while associate professors and those 
employed at institutions receiving high levels of funding are less likely to 
change employers. (These results suggest that most of the job changes 
are voluntary.) Prestige of department and number of publications seem 
to make little difference. 

Model 2 allows the hazard rate to be different in each of the five years, 
even when other variables are held constant. This was accomplished by 
creating a set of four dummy variables, one for each of the first four 
years of observation. Coefficient estimates and test statistics are shown 
in Table 2. The coefficient for each dummy variable gives the difference 
in the logit of changing employers in that year and the logit of changing 
employers in year 5, holding other variables constant. No clear pattern 
emerges from these coefficients, although there does appear to be some 


TABLE 2 . 
Estimates for Logit Models Predicting the Probability of 
an Employer Change, 848 Person-Years 


y Model 1 Model 2 
an b t OLS t b t OLS t 
Prestige of .045 21 22 -056 26 25 

Department 

Funding —.077 ~245* -2.34 -.078  —2.47* -2.36 
Publications —.021 ~.75 —.86 ~.023 -.19 aa 
Citations .0072 2.44* 2.36 0069  2.33* 2.23 
Rank (D)? -1.4 =2.86** _2.98 -1.6 -3.12** -3.26 
Year 1 (D) -96 -2.11* -2.07 
Year 2 (D) —.025 06 A8 
Year 3 (D) -tá -1.60 1.54 
Year 4 (D) BR 42 .38 
Constant 4.95 2.35 

Log-likelihood —230.95 


a. (D) tes dummy variable. 
*Signi tat .05 level, 2-tailed test. 
“*Significant at .01 level, 2-tailed test. 


tendency for the hazard rate to increase with time. In this example, the 
introduction of the dum 


my variables makes little difference in the 
estimated effects of the other variables, but this will not always be the 
case. 


The Likelihood-Ratio Chi-Square Test 


other variables. The 


p at are in another model, but also 
variables. The test statistic is constructed from a 
ood estimation, the maximized value of 
1S IS given in Table 2 for each of the two 
+e i of two models o ice the 
positive difference between t > One calculates tw 
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square distribution. The associated degrees of freedom will be the 
number of constraints that distinguish the two models; in most cases, 
this will just be the difference between the numbers of variables in the 
two models. In this example, twice the difference in the log-likelihoods is 
9.4. Since Model I has four fewer variables than Model 2, there are four 
degrees of freedom. This chi-square value is just below the critical value 
for the .05 level of significance. Thus the evidence is marginal that the 
hazard rate varies autonomously with time. 

This procedure for comparing log-likelihoods to test hypotheses 
about sets of variables is quite generally applicable for maximum likeli- 
hood estimation. It can therefore be applied to any of the models and 
estimation procedures to be discussed in later chapters. 


Problems with the Discrete-Time Method 


In this example, the number of constructed person-years was quite 
manageable with respect to computation. On the other hand, when a 


large sample is followed over a long interval divided into small discrete 
units of time, the resulting numbe 


impractically large. In the bioche 
months instead of person 
nearly 10,000. One can al 


example because most ac 
the academic year.) 


Allison (1982) discusses several ways in which discrete-time methods 
can bi i a i 


model can be done by log. 
log-linear models the comp 


l r utational cost depends on the 
in the contingency table, n 


number of cells 
ot the number of observ. 


ations in the cells. 


i rix is constructed, alternative models can be 
at extremely low 


i s OLS regressions were 
performed On the 848 biochemistry person-years wi 
variable coded | if i 


z =. 
est Benga, Fg Son Woy 
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i e gi in Te 2, next to 
zero. The t-statistics for these regressions are given in ans se 
the t-statistics for the logit model. The results are remarkably s 


sat ces 3 imation 
(For guidance as to when least squares will give a good approximati 
see Goodman, 1975.) 


Discrete Versus Continuous Time 


Before moving on to continuous-time methods, it must be sig 
that the discrete-time method described here will virtually ca bed 4 
results that are quite similar to the continuous-time methods gesen B 
later. In fact, as the time units get smaller and smaller, the ae ‘ 
model of equation 3 converges to the proportional hazards mone o 
Chapter 4. While there is some loss of information that comes A 
knowing the exact time of the event, this loss will usually make li 
difference in the estimated standard errors. 

Thus, the choice between discrete- 
should generally be made on the basis 
venience. When there are no time-var 
usually simpler to do event-history 
described in the next two chapters. T 
continuous-time methods do not req 
each individual be subdivided into 
When there are time- 
the relative costs 
discrete-time metho. 


and continuous-time methods 
of computational cost and cone 
ying explanatory variables, it is 
analysis using one of the methods 
his is largely due to the fact that the 
uire that the observation period for 
a set of distinct observational units. 
varying explanatory variables, on the other hand, 


and convenience of using continuous-time and 
ds are quite comparable, 


3. PARAMETRIC METHODS FOR 
CONTINUOUS-TIME DATA 

Although the d 

ble, most event his: 

In this chapter we shal 


iscrete-time method just discussed is widely applica- 


using continuous-time methods. 


Y aspect of the model is completely 
for the val in parameters, which must be 
ions in which each individual 

and all events are treated alike. 
any closely related a 
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understanding of these methods requires knowledge of calculus (includ- 
ing simple ordinary differential equations) and maximum likelihood, it 
is possible to become an intelligent user with only a modest mathemati- 
cal background. 


The Continuous-Time Hazard Rate 


What nearly all these methods share is the notion of the hazard rate as 
the fundamental dependent variable. In the previous chapter, the 
discrete-time hazard was defined as the probability that an individual 
experiences an event at time t, given that the individual was at risk at 
time t. This definition will not work in continuous time, however, 
because the probability that an event occurs at exactly time t is infinitesi- 
mal for every t. Instead, consider the probability that an individual 
experiences an event in the interval from t to t + s, given that the 
individual was at risk at time t, and denote this probability by P(t, t + s). 
When s = I, this is equivalent to the discrete-time hazard defined in 
Chapter 2. Next we divide this probability by s, the length of the interval, 
and let s become smaller and smaller until the ratio reaches a limit. This 
limit is the continuous-time hazard, denoted by h(t). Other common 
symbols for the hazard rate are A(t) and r(t). Formally, 


h(t) = lim P(t, t +s)/s [4] 
s— 


Although it may be helpful to think of this as the instantaneous 
probability of event occurrence, it is not really a probability because it 
can be greater than 1. In fact, it has no upper bound. A more accurate 
interpretation is to say that h(t) is the unobserved rate at which events 
occur. Specifically, if h(t) is constant over time, say h(t) = 1.25, then 1.25 
is the expected number of events in a time interval that is one unit long. 
Alternatively, 1/h(t) gives the expected length of time until an event 
occurs, in this case .80 time units. This way of defining the hazard 
corresponds closely to intuitive notions of risk. For example, if two 
persons have hazards of .5 and 1.5, it is appropriate to say that the 
second person’s risk of an event is three times greater. 

For most applications, it is reasonable to assume that the hazard rate 
changes as a function of time, either the time since the last event or the 
age of the individual. For example, available evidence indicates that, at 
least after age 25, the hazard rate for being arrested declines with age. On 
the other hand, the hazard for retirement certainly increases with age. 
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The hazard for death from any cause has a U shape: It is relatively high 
immediately after birth, declines rapidly in the early years, and then 
begins to rise again during late middle age. Oh 
It is important to realize that the shape of the hazard rate function is 
one of the key distinguishing features of different models for 
continuous-time data. In fact, the hazard function h(t) completely 
determines the probability distribution of the time until an event (or the 
time between events when events are repeatable). Later in this chapter 


we shall see how one might go about choosing a shape for the hazard 
function. 


Continuous-Time Regression Models 


The next step is to develop models for the dependence of h(t) on time 
and on the explanatory variables. We shall consider three models—the 
exponential, the Weibull, and the Gompertz—that differ only in the way 
that time enters the equation. To keep it simple, let us assume that we 
have only two explanatory variables, xı and x2, which do not vary over 
time. An obvious approach would be to let h(t) be a linear function of 
the explanatory variables. This is awkward, however, because h(t) can- 
not be less than zero, and there is nothing to prevent a linear function 
from being less than zero. It is typical, then, to take the natural loga- 
rithm of h(t) before setting it equal to a linear function of the explana- 
tory variables. Thus, one of the simplest models is 


log h(t) = a + bix; + box2 [5] 


where a, bı, and b are constants to be estimated. In this equ 
function of the explanatory variables, but it does not depend ontime. A 
hazard that is constant over time implies an exponential di 


a i stribution for 
the time until event occurrence and, hence, this is often referred to as the 
exponential regression model. 


ation h(t) isa 


Specifying a constant hazard is usually 
event is a death, for example, the hazard sh 


r aging of the organism. On the other hand, if the event is an employer 
ange, the hazard is likely to decline with time as the individual 
ecomes more invested in the job. We can relax the assumption of a 


constant hazard by allowing the log of the hazard to increase or decrease 
linearly with time, i.e., 


unrealistic, however. If the 
ould increase with time, due 


log h(t) = a + bixi + box2 + ct [6] 
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where c is a ste i i 

Fee fae which may be either positive or negative. Because 

Shih ae rise to a Gompertz distribution for the time until event 
ence, it is convenient to ati 

EPERE gs refer to equation 6 as the Gompertz 

Alternati f i 

pencil tively, let us consider a model in which the log of the hazard 

ses or decreases linearly with the log of time: 


log h(t) = a + bix: + bex2 + c log t [7] 


where c is ies 
Weibun ace to be greater than -1. This model generates a 
AEE EES aon ri the time until event occurrence. Hence, it is 
meea ene a oman regression model. 
enters the sen ap er models that differ only in the way that time 
analinio , but these three are the most common. For addi- 
asaos o bE on these three models, see Lawless (1982). Although 
Gompert: sh ecard another explanatory variable in the Weibull and 
difference sec its role is much more fundamental. In particular, the 
éstivtintion an is equations 6 and 7 requires an entirely different 
lögtime; edure rather than a simple transformation from time to 
Note the y 
for E the Weibull model nor the Gompertz model allows 
increase a say ae U shape; the hazard may either decrease or 
in some applicati š at may not change direction. This is a disadvantage 
POE secon ions. Later we shall consider some models that do not 
Notice at riction. 
Thepare pear none of these models has arandom disturbance term. 
Variation in E models, however, because there is random 
able h(t) and a minority between the unobservable dependent vari- 
some who Biel aie length of the time interval. Still, there are 
issue that will be ies thess models should include a disturbance term, an 
iscussed at the end of this chapter. 


Maxi i 
ximum Likelihood Estimation 


The difficulty comes in trying to 
In the late 1960s, statisti- 


ures for the exponential 
Glasser, 1967), and it was 
ailable for many other 
kelihood estimation 
oning some of 


ae down models is easy. 
dave A ONG: especially with censored data. 
teitesston pe maximum likelihood proced 
not long E el(Zippin and Armitage, 1966; 
modakas Ee A likelihood was av 
fparameis e ppendix A discusses maximum li 
the ric models in some detail, but it is worth menti 
general properties here. 
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As anestimation method for censored data, maximum likelihood is 
hard to beat. It combines the censored and uncensored observations in 
such a way as to produce estimates that are asymptotically unbiased, 
normally distributed and efficient (i.e., have minimum sampling var- 
iance). Unfortunately, “asymptotically” means that these properties are 
only approximations that improve as the sample gets larger. No one 
knows how well they hold in small samples or how large is large enough. 
In the absence of compelling alternative methods, however, maximum 
likelihood is widely used with both small and large samples. 

There are many computer programs available to do maximum likeli- 
hood estimation of one or more of these models. The RATE program 
(Tuma, 1979) will estimate the exponential model, the Gompertz model, 
and several extensions of the Gompertz model. The GLIM program 
(Baker and Nelder, 1978) will estimate the exponential and Weibull 

regression models (in version 3), but only by employing special proce- 
dures that are not documented in the manual (Aitkin and Clayton, 1980; 
Roger and Peacock, 1983). Weibull and exponential models can also be 
estimated with two author-distributed programs, CENSOR (Meeker 
and Duke, 1981) and SURVREG (Preston and Clarkson, 1983). 


An Empirical Example 
To illustrate these methods, we shall apply the 


model to the criminal recidivism data (Rossi 
briefly described in Chapter 1. 


exponential regression 
et al., 1980) that were 
i The sample consisted of 432 males who 
were followed for one year after their release from Maryland state 
prisons. The study was actually a randomized field experiment in which 
approximately half the men received financial assistance while the other 
half served as a control group. During the follow-up year, the subjects 
were interviewed monthly regarding their experiences during the pre- 


vious month. At the end of the year, a search was made through district 

court records for data on arrests and convictions 
The event of inte 

determine how 


arrests, parole status, 
employeddhuiting the. i ce, and number of weeks 

Y $ t release. With the exception 
variables are clearly constant in value 


over the follow-up period. While employment status is obviously change- 
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TABLE 3 
Coefficient Estimates for Three Models of Recidivism 
1 2 3 
Time-Dependent 
Proportional Proportional 
Explanatory Exponential Hazards Hazards 
Variables b t b t b t 
Financial aid (D)* —.325 -1.69 —.337 -1.76 —.333 -1.74 
Age at release —.067 -2.89** —.069 —2.94** -.064 -—2.78** 
Black (D) -280 -90 -286 92 354 1.13 
Work experience (D) —.117 33 -122 -.55 —.012 —.06 
Married (D) 414 -1.08 —.426 -1.11 —.334 —.87 
Paroled (D) -.037 —.19 —.035 —.18 —.075 —.38 
Prior arrests 095 3.21% AOL .3:36%* 100 gare 
Age at earliest arrest .070  2.30* 071 235" 077 2.48* 
Education —.263 -1.96* —.264 -1.96* —.293 —2.12* 
Weeks worked .039 -1.76 039 -1.78 - 
Worked (D) 1.397 —5.65** 


Constant —3.860 = = 


ariable. 


a. (D) indicates dummy v 
‘Significant at .05 level. 
**Significant at .01 level. 


able over the full year after release, this analysis will treat it as constant 
over time. Later, we shall examine a model allowing employment status 
to vary over time. 

Most programs for estimating event history models require that the 
data on the dependent variable be input in two parts: adummy variable 
indicating whether or not the event (in this case an arrest) occurred 
during the observation period, and a variable giving either the time of 
the event (if it occurred) or the time of censoring. In this example, time 
was measured in weeks since release. Thus, for those who were arrested, 
the second component of the dependent variable was the number of 
weeks from release to arrest. For those who were not arrested, the week 
number was 52, the last week that they were observed. Estimates for an 
exponential regression model were obtained with the GLIM program 
(see Appendix B for program listing), and are reported in panel | of 
Table 3. 

Interpreting the coefficient estimates is much like interpreting 
unstandardized regression coefficients. For example, the coefficient of 

.067 for age at release means that each additional year of life reduces 
the log of the hazard by .067, controlling for other variables. A some- 
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what more intuitive interpretation is obtained by exponentiating the 
coefficients (taking their antilogs). That is, if b is the coefficient, com- 
pute exp(b), which means raising the number e (approximately 2.7 18) to 
the b power. The interpretation is then as follows: For each unit increase 
in an explanatory variable, the hazard is multiplied by its exponentiated 
coefficient. Further, computing 100(exp(b) — 1) gives the percentage 
change in the hazard with each one unit change in the explanatory 
variable. For example, the coefficient for number of prior arrests is .095, 
and exp(.095) = 1.10. This tells us that each additional prior arrest 
increases the hazard by an estimated 10 percent. For dummy variables, 
the exponentiated coefficient gives the relative hazard for the groups 
corresponding to values of the dummy variable, again controlling for 
other variables. The coefficient for the dummy variable for financial aid, 
for instance, is -.325, which gives exp(-.325) = .72. This means that the 
hazard of arrest for those who received financial aid was about 72 
percent of the hazard for those who did not recieve aid. Alternatively, 
since 1/.72 = 1.38, we can say that the hazard for those who did not 
receive aid was about 38 percent larger than the hazard for those who 
did receive aid. 

The ratios of the estimates to their standard errors are also useful 
Statistics. For moderate to large samples, these can be treated like 
t-statistics in an ordinary multiple regression. Thus if the ratio exceeds 2, 
the coefficient is significantly different from zero at the .05 level with a 


two-tailed test. Also the relative sizes of these ratios can be used to gauge 
the relative importance of the vari i 


being arrested at any point in time.’ 


Censoring 


In both this example and the biochemistry example in Chapter 2, 
censoring occurred at the same point ; 


In many situations, however, the censoring times will vary across 
individuals. This occurs when indi 
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reason or another, before the end of the observation period. Possible 
reasons include death, migration out of the population at risk, failure to 
locate the individual in later interviews, or refusal to continue in the 
study. When censoring times vary across individuals (and are not under 
the control of the investigator) censoring is said to be random. Random 
censoring also includes designs in which observation ends at the same 
time for all individuals, but begins at different times. 

When censoring is random, virtually all event history methods 
assume that the censoring times are independent of the times at which 
events occur, controlling for the explanatory variables in the regression 
model. This assumption would be violated, for example, if individuals 
who were more likely to be arrested were also more likely to migrate out 
of the study area. Although it is possible to develop models which allow 
for dependence between censoring and the occurrence of the event of 
interest, this is rarely done. The main reason why it is not often done 
(aside from the inconvenience of a nonstandard model) is that it is 
impossible to test whether any dependence model is more appropriate 
than the independence model (Tsiatis, 1975). In other words, the data 
can never tell you which is the correct model. 

It is possible, however, to get some idea of how sensitive one’s 
analysis is to violations of the independence assumption. In essence, the 
Sensitivity analysis consists of reestimating the model twice, each time 
treating the censored observations in a different extreme way. The first 
Step is to redo the analysis with the data altered so that censored 
Observations experience an event at the time of censoring. In most cases, 
this is easily accomplished by recoding the dummy variable which 
indicates whether or not an observation is censored so that all observa- 
tions have a value of 1. The second step is to redo the analysis so that the 
censoring times are all equal to the longest time observed in the sample, 
regardless of whether that time is censored or uncensored. Thus, even if 
Some of the released prisoners had been censored before the end of the 
One-year period, their censoring times would be set to one year. If the 
Parameter estimates resulting from the standard analysis are similar to 
those obtained from these two extreme situations, one can be confident 
that violations of the independence assumption are unimportant (Peter- 
son, 1976). Note that this approach to censoring can be used with any of 
the methods discussed in later chapters. 


Some Other Models 


The three models considered above—the exponenti 
and the Gompertz—are all members of a general class of models k 
as proportional hazards models. In the next chapter, we shall see h 


al, the Weibull, 
nown 
ow to 
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estimate this general class without having to choose a particular 
member. Before leaving the subject of parametric models, however, let 
us briefly consider another general class of models known either as 
accelerated failure time models (Kalbfleisch and Prentice, 1980) or as 


location-scale models (Lawless, 1982). If T is the elapsed time until an 
event occurs, this class can be written as 


log T =a + bixi +b2x2+...+u [8] 


where u is a random disturbance term that is independent of the x’s. 
Different members of this class have different distributions for the 
disturbance term u. Distributions that are commonly assumed include 
the normal, log-gamma, logistic, and extreme value distributions. These 
give rise, respectively, to log-normal, gamma, log-logistic, and Weibull 
distributions for T. Thus, the Weibull regression model is a member of 
both the proportional hazards class and the accelerated failure time 
class. In fact, it can be shown that the Weibull (and its special case—the 
exponential) is the only model that falls into both of these classes. 
The accelerated failure time models can be reexpressed so that the 
dependent variable is the hazard rate rather than log T, but these 


expressions tend to be quite complicated. The lognormal 
logistic models are unusual in that the 


tion of time; it first increases, reaches a 
with time. When there is no censoring 
models can be consistently estimated 
sion of log T onthe explanatory 
however, one must usually reso: 
For details see Lawless (1982). 


and log- 
hazard is a nonmonotonic func- 
peak, and then gradually declines 
of T, the accelerated failure time 
by ordinary least squares regres- 
variables. In the presence of censoring, 
rt to maximum likelihood estimation. 


Choosing a Model 


How does one choose among alternative parametric models? As with 
Most statistical methods, it is rather difficult to codify the procedures 
involved in choice of a model. There are many factors that should 
legitimately enter the decision and none can be easily quantified. Invari- 


ably there is tension among mathematical convenience, theoretical 
appropriateness, and empirical evidence. 


With models of the sort we have just been discussing, the key differen- 
liating factor is the way in which the hazard rate depends on time. The 
first choice is between the exponential regression model, in which there 
is no dependence on time, and all other models. From a mathematical 
and computational point of view, the exponential model is very attrac- 
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tive. For this reason, it is often useful as a first approximation even when 
it is known to be false. Substantive theory, on the other hand, will 
usually suggest many reasons why the hazard should change with time. 
As for empirical evidence, there are well-known graphical methods for 
assessing whether event times have an exponential distribution (Gross 
and Clark, 1975; Elandt-Johnson and Johnson, 1980, Ch. 7; Lawless, 
1982, Ch. 2). These can be quite useful if the explanatory variables have 
relatively weak effects on the hazard. If effects are strong, however, the 
graphical methods may show a declining hazard even when the true 
hazard is constant over time. 

A better approach is to fit the exponential regression model with 
explanatory variables and then examine its fit to the data. This can be 
done by using the residual plots described by Lawless (1982) for evi- 
dence of departure from exponentiality. A more formal method for 
testing the fit of the exponential regression model derives from the fact 
that the exponential is a special case of several of the models considered 
thus far, including the Weibull, Gompertz, and gamma regression mod- 
els. The procedure is to fit both the exponential and one of these other 
models by maximum likelihood. The relative fit of the two models can 
then be tested by comparing log-likelihoods as described in Chapter 2. 
Rejection of the exponential model is indicated when its log-likelihood 
differs significantly from that of the alternative model. 

If the exponential model is rejected, one must then choose between 
monotonic models (in which the hazard always increases or always 
decreases with time) and nonmonotonic models (in which the hazard 
May sometimes increase and sometimes decrease). Again, both substan- 
tive theory and the graphical methods referenced above can be helpful in 
making this choice, As before, however, one must be wary of univariate 
graphs for assessing the shape of the hazard function. Strong effects of 
explanatory variables can make the evidence misleading. As an alterna- 
tive to univariate graphical techniques, residual plots are available for 
assessing the fit of a chosen model (Lawless, 1982). 

As noted above, the lognormal and log-logistic models have nonmon- 
tonic hazard functions in which the hazard first increases and then 
decreases, This might be appropriate for many kinds of social mobility 
in which there is (a) an initial “resting period” after the previous move, 
(b) an increase in the risk of a move as the resting period is completed, 
and (c) a decline in the risk of a move as individuals become more 
invested in a particular social location. On the other hand, there is no 
Convenient parametric model to represent U-shaped hazard functions. 
If there is strong departure from monotonicity, it is often better to shift 
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to the semiparametric, proportional hazards model discussed in the next 

chapter. ; ; ; é 
Within the class of monotonic models, choice of model will often be 

based on mathematical and computational convenience. Social theory 


and empirical evidence are typically inadequate to discriminate be- 
tween, say, a Weibull model and a Gompertz model. 


Unobserved Sources of Heterogeneity 


Many social theories imply or Suggest that the hazard rate for some 
event should be increasing or decreasing with time. For example, certain 
job search theories imply that the hazard rate for obtaining ajob should 
increase with the length of unemployment (Heckman and Singer, 1982). 
While the procedures described in the preceding section on model choice 
can be useful in testing such hypotheses, great caution is required in 
trying to draw inferences about the effect of time on the hazard rate. The 
basic problem was mentioned above, but is worth some elaboration 
here. Even if the hazard rate is constant over time for each individual, 
differences (across individuals) in the hazard rate that are not incorpo- 
rated into the model will tend to produce evidence fora declining hazard 
rate (Heckman and Singer, 1982). 

Intuitively, what happens is that individuals with high hazard rates 
experience events early and are then eliminated from the risk set, As 
time goes on, this selection Process yields risk sets that contain individu- 
als with predominantly low risks. The upshot is that it is extremely 
difficult to distinguish hazard rates that are truly declining with time 
from simple variation in hazard rates across individuals. On the other 


hand, if one observes evidence for an increasing hazard rate, this can 
always be regarded as evidence that the hazard really increases with 
time. 


In particul 


r, Heckman and Si 
ered an extended Weibull eae 


er (1982) have consid- 
model, 


log h(t) =a + bixi + baxa +c logt+ u [9] 
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where u is a random disturbance. In principle, estimation of such a 
model should allow one to separate the effects of time from the unob- 
Saved heterogeneity. In practice, they have found that estimates of c 
re a b coefficients are highly sensitive to the choice of a particular 
n i ution for u. Although work is being done to remedy this problem, 

is still too early to conclude whether the approach will be generally 
useful. 

Biostatisticians have been remarkably unconcerned about the prob- 
lem of unmeasured sources of heterogeneity, though there is every 
reason to suspect that it will be just as serious for biological as for social 
phenomena. Their attitude seems to be that the consequence of such 
heterogeneity will be mainly to change the shape of the distribution of T 
(the time of the event), and that this can be accommodated by specifying 
a different distribution for T (e.g., a Gompertz instead of a Weibull), or 
ay using a more general model (e.g., the proportional hazards model 

Onsidered in the next chapter). This position is reasonable so long as 
ODS is primarily concerned with estimating the effects of the explanatory 
variables and is not particularly interested in testing hypotheses about 


the effect of time. 


4. PROPORTIONAL HAZARDS AND 
PARTIAL LIKELIHOOD 


The methods discussed and applied in Chapter 3 represent a tre- 
mendous advance over ad hoc approaches to event history data, but 
they still have some disadvantages. First, it is necessary to decide how 
the hazard rate depends on time, and there may be little information on 
which to base such a choice. Moreover, if the hazard function is believed 
to be nonmonotonic, it may be difficult to find a model with the 
appropriate shape. Much experience with these models suggests that the 
coefficient estimates are not terribly sensitive to the choice of the hazard 
function, but one can never be sure what will happen in any particular 
situation. The second, and perhaps the more serious, problem is that 
these models do not allow for explanatory variables whose values 
change over time. While it is possible to develop fully parametric 
models that include time-varying explanatory variables (Tuma, 1979; 
Flinn and Heckman, 1982a, 1982b) estimation of these models is some- 
what cumbersome. 


l Both these problems were solved in | 
Statistician, published a paper entitled 


972 when David Cox, a British 
“Regression Analysis and Life 
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Tables,” in which he proposed a model and an estimation method that 
have since become extremely popular, especially in biomedical research. 


The Proportional Hazards Model 


Commonly referred to as the “proportional hazards model,” Cox's 
model is a simple generalization of the parametric models we have just 
considered. We shall postpone, for the moment, a consideration of 


models with time-varying explanatory variables. For two time-constant 
variables, the model may be written as 


log h(t) = a(t) + bix: + box. [10] 


where a(t) can be any function of time. Because this function does not 
have to be specified, the model is often described as partially parametric 
or semiparametric. It is called the proportional hazards model because 
for any two individuals at any point in time, the ratio of their hazards is a 
constant. Formally, for any time t, hi(t)/h,(t) = c where i and j refer to 
distinct individuals and c may depend on explanatory variables but not 
on time. Despite the name, this is not a crucial feature of Cox’s model 


because the hazards cease to be proportional as soon as one introduces 
time-varying explanatory variables. 


Partial Likelihood 


Again, it is easy to write down such models but difficult to devise 

ways to estimate them. Cox’s most important contribution was to 
Propose a method called partial likelihood which bears many similari- 
ties to ordinary maximum likelihood estimation. Mathematical details 
on partial likelihood are given in Appendix A, but some general proper- 
ties can be mentioned here. The method relies on the fact that the 
likelihood function for data arising from the Proportional hazards 
model can be factored into two Parts: One factor contains information 
only about the coefficients bı and bz; the other factor contains informa- 
tion about bi, bz, and the function a(t). Partial likelihood simply dis- 
cards the second factor and treats the first factor as though it were an 
ordinary likelihood function. This first factor depends only on the order 
in which events occur, not on the exact times of occurrence.’ 

The resulting estimators are asymptotically unbiased and normally 
distributed. They are not fully efficient because some information is lost 
by ignoring the exact times of event occurrence. But the loss of efficiency 
is usually so small that it is not worth worrying about (Efron, 1977). 
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It is difficult to exaggerate the impact of Cox’s work on the practical 
analysis of event history data. In recent years, his 1972 paper has been 
cited well over 100 times a year in the world scientific literature. In the 
judgment of many, it is unequivocally the best all-around method for 
estimating regression models with continuous-time data. 

Other methods may be more appropriate in cases where a major 
substantive concern is with the dependence of the hazard on time. For 
example, the principle of cumulative inertia suggests that the longer an 
individual is in a particular state, the less likely he is to leave that state 
(McGinnis, 1968). Such a hypothesis could not be tested under partial 
likelihood estimation. In most cases, however, the main concern is with 
the effects of the explanatory variables, and the dependence on time is of 
little interest. As noted in Chapter 3, moreover, testing hypotheses 
about the effect of time on the hazard is difficult under any model 
because unmeasured sources of heterogeneity usually contaminate that 
effect. 

Computer programs for partial likelihood estimation for the propor- 
tional hazards model are now widely available as part of the SAS (SAS 
Institute, 1983) and BMDP (Dixon, 1981) statistical packages. Other 
publicly available programs to do partial likelihood are RATE (Tuma, 
1979) and SURVREG (Preston and Clarkson, 1983). The SAS supple- 
mentary procedure PHGLM is very easy to use but does not allow for 
time-varying explanatory variables. Neither does the proportional 


hazards model in RATE. 


Partial Likelihood Applied 
to an Empirical Example 

Let us return to the criminal recidivism data of Rossi et al. (1980) to 
See what happens when a proportional hazards model is estimated. 
Using the same explanatory variables as for the exponential regression 
model, estimates were obtained using the SAS procedure PHGLM. (A 
Program listing is given in Appendix B.) Results are reported in panel 2 
of Table 3. Both the coefficient estimates and the ratios of the estimates 
to their standard errors (t-statistics) are almost identical to those 
Produced by the exponential regression model. This should not be too 
Surprising since the exponential model is just a proportional hazards 
model in which the arbitrary function a(t) is fixed at a constant value. 
The fact that the estimates are so similar suggests that the hazard for an 
arrest does not change much over the 12-month period. Given these 
results, there would be no point in estimating a Weibull regression 
model: since the Weibull model falls between the exponential and 
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proportional hazards models in generality, the estimates would hardly 
vary from those in panels | and 2 of Table 3. 


Time-Varying Explanatory Variables 


The proportional hazards model can be extended easily to allow for 
explanatory variables that change in value over time. A model with two 


explanatory variables, one constant and one varying over time, may be 
written as 


log h(t) = a(t) + bix: + bix2(t) [11] 


where, as before, a(t) may be any function of time. This model says that 
the hazard at time t depends on the value of X2 at the same time t. In some 
cases, however, there may be reason to believe that there is a lag between 
a change in a variable and the effect of that change on the hazard. For 
example, if one is interested in the effect of a bout of unemployment on 
the hazard of divorce, it might be plausible to suspect that there is a lag 
between the loss of a job and an increase in the hazard. If the suspected 


lag is two months (and time is measured in months) the model can be 
modified to read 


log h(t) = a(t) + bixı + boxa(t - 2) [12] 


With or without lags, models with time-varying explanatory varia- 
bles can be estimated using the partial likelihood method described 
earlier. The derivation of the partial likelihood function is essentially the 
same with time-varying explanatory variables, but the computer algo- 
rithms for constructing and maximizing that likelihood function are 
more complex. Hence, not all programs for partial likelihood estima- 
tion will handle time-varying explanatory variables (sometimes referred 
to as “time-dependent covariates”), 

Returning to the recidivism example, we noted earlier thai the varia- 


ble “weeks employed during the first three months after release” is 
merely a substitute for what is 


r actually a time-varying explanatory 
variable. What one would ideall 


affected by employment st ny given point in time during the 
one-year follow-up. This questi 


includes information on 
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employment status as a time-varying explanatory variable. (See Appen- 
dix B for program listing.) Results are shown in panel 3 of Table 3. 
For the most part the results are quite similar to those in panels | and 
2. The big difference is in the effect of employment status, which is now 
clearly the most important variable in the model. Exponentiating the 
coefficient of -1.397 yields .25, which says that the hazard of arrest for 
those who were working was only one fourth the hazard of those who 


were not working. 


Problems with Time-Varying 
Explanatory Variables 

A word of warning is in order here. Regardless of the computer 
program, estimation of proportional hazards models with time-varying 
explanatory variables can enormously increase computational costs. In 
this example, for instance, the CPU time increased by a factor of 10 with 
the inclusion of just one time-varying explanatory variable. Moreover, 
setting up the model may not be straightforward. With the exception of 
Variables that are very simple functions of time itself, BMDP2L requires 
the inclusion of a FORTRAN subroutine to define the time-varying 
Variables, Procedures for doing this are not well documented. r 

: Another possible complication in the estimation of models with 

time-varying explanatory variables involves the frequency with which 
those variables are measured. Strictly speaking, estimation of such 
models requires that for each time that an event occurs, values of ane 
explanatory variables must be known for all individuals at risk at that 
time. Thus, if an event occurred at time 10, and 15 individuals were at 
risk at that time, the values of the explanatory variables at time 10 must 
be known for all 15 individuals. Typically, that would require that the 
explanatory variables be measured continuously over time. i 

In practice, however, time-varying explanatory variables are usually 
measured at regular intervals. In the example just considered, employ- 
ment status was known for each week of observation. That created no 
Problem because the time of arrest was measured in weeks. Difficulties 
arise when time of event occurrence is measured more precisely than n 
interval at which the explanatory variables are measured. For san 
even times may be measured in days but the values of the explanatory 
Variables may be known only at the beginning of each month. =e 

In such cases, some ad hoc procedure will be necessary to estim ae 
Values of the explanatory variables at the times of events. The Sass 
approach is to use the value closest in time to the event time as 
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estimated value. A better method is to use linear interpolation, which is 
equivalent to weighted averaging. Suppose, for example, that the values 
of an explanatory variable x are known at time 10 and time 20 but an 
event occurs at time 13. An estimate of x(13) is given by x(10)(.7) + 
x(20)(.3). For a discussion of these and other methods, see Tuma (1982). 


Adequacy of the Proportional Hazards Model 


Many researchers worry about whether their data satisfy the propor- 
tional hazards assumption. For those with such concerns, there are ways 
of both assessing the validity of this assumption and altering the model 
to correct for violations. Before discussing these methods, however, let 
us consider the possibility that these worr 
models go, the proportional hazards 
and nonrestrictive—the main reason forits great popularity. Even when 
the proportional hazards assumption is violated, it is often a satisfactory 
approximation. Those who are concerned about misspecification would 
usually do better to focus on the Possibilities of omitted explanatory 
variables, measurement error in the explanatory variables, and nonin- 


dependence of censoring and the occurence of events. 
With that in mind, 


ies may be exaggerated. As 
model is extraordinarily general 


n interaction between time and 
one or more explanatory variables, e.g., 


log h(t) = a(t) + bx + ext [13] 


This model differs from the usual model by 
and t as one of the explanatory variables. If c 
the effect of time on the hazard increases linea 
tively, we can say that the effect ofx ontheh 
time. Hence, the hazard 
explanatory variable on 
time. 


including the product of x 
is positive, we can say that 
rly as x increases. Alterna- 
azard goes up linearly with 
s are not proportional if the effect of some 


the hazard is different at different points in 
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where g(x,t) is some nonlinear function of x and t with known parame- 
n One simply computes g(x,t) for each event time t and includes this 

the model as a time-varying explanatory variable. Thus, one very 
sae EN for nonproportionality isto estimate models of this 
3 san then test whether the coefficient c differs significantly from 
zero. If it does, then one has already solved the problem by estimating 
the extended model. 

A limitation of this approach is that not all p 
grams allow time-dependent explanatory variable 
tion of such models tends to be expensive. There is a much cheaper 
graphical method that works well when time interacts with a categorical 
(nominal) variable, or one that can be treated as categorical. In this 
method, a certain function of time (the log-log survival function) is 
Plotted for subsamples corresponding to the different levels of the 
categorical variable. If the categorical variable is sex, for example, one 
es is produced for males and another plot is produced for females. 
: hese plots should be roughly parallel if the proportional hazards 
io is satisfied. (For further information see Lawless, 1982, and 

e BMDP2L manual.) 
it sBhaphical test provides e 
for He assumption, there isa method called : t ae 
sesh ‘proportionality. It is much cheaper than using time-varying 
3 natory variables. The basic idea is to divide the sample into strata 
iia to the values of the categorical variable that interacts with 

ame: A separate model is then postulated for each stratum. 
we for example, that males and females are thought to have 
Proportional hazards. We then specify two models: 


artial likelihood pro- 
s. Moreover, estima- 


vidence against the proportional 
“stratification” that allows 


males: log h(t) = ai(t) + bixi + b2X2 + + 


females: log h(t) = a2(t) * bixı + box? + [15] 
ees models share the same set of b coefficients but each has a different 
taneo nspecified) function of time. Both models can be estimated a 
in fee using the partial likelihood method. Stratification Is ee e 
dure M DP2L program, the 1983 version of the SAS PHGLM proce- 

. and in the SURVREG program (Preston and Clarkson, 1983). 
eo approach to checking the adequacy of the model is i 
tet of residuals. Methods for calculating residuals esa 
Eo hazards model have been proposed by Kay (l 977) a 
BMDP?21 (1982). Kay’s formula has been incorporated inie = 

deviations Program, which also plots those residuals in such a way fie 
ns from a straight line represent a failure ofthe model. Unfortu- 
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nately, a recent investigation (Crowley and Storer, 1983) raises ques- 
tions about the diagnostic value of such plots. : : 
Those accustomed to multiple regression will undoubtedly wish that 
there were something like R? for proportional hazards models. Harrel 
(1980) has developed an R? analog for these models, and has incorpo- 
rated it into the SAS supplementary procedure PHGLM. From his 
description, however, it is not clear that this is the most appropriate 
analog, and others may yet be developed. In any case, it must be 
emphasized that such a statistic does not measure how well the assump- 
tions of the model are satisfied by the data. As in ordinary multiple 
regression, a low R? is quite compatible with a model that is perfectly 
specified, and a high R? can be found for models that are grossly 


misspecified. Such statistics only tell one how much variation in the 
dependent variable is attributable to v. 


ariations in the explanatory 
variables. 


Choice of Origin of the Time Scale 


One aspect of model choice that h 
question of when time begins. Whil 


ample, the origin of the time 
e was relatively unambiguous. The date of release is a natural 


ting point for calculating time of first arrest after release, Similarly, 
ne is estimating models for the hazard of divorce, the date of 
marriage is a natural starting point for calculating time of divorce. 


if o 


4] 


In theory, one could formulate and estimate proportional hazards 
models in which the hazard depended arbitrarily on two or more time 
scales. In practice, this requires very large samples or special conditions 
described by Tuma (1982). Even when this approach is not possible, 
however, one can always explicitly introduce different time scales as 
explanatory variables. For example, in estimating a proportional 
hazards model for divorce in which the hazard varies arbitrarily with 
duration of marriage, one could also include calendar year, age of 
husband, and age of wife as explanatory variables. If any of these 
variables has a nonlinear effect (on the log of the hazard), it is necessary 
to specify the variable as a time-varying explanatory variable. When the 
effect is linear (on the log of the hazard), however, it is sufficient to 
measure the variable at the beginning of the principal time scale, in this 
case the beginning of the marriage.” Another example is provided by 
Table 3 where age at release is included as an explanatory variable. 


Partial Likelihood for Discrete-Time Data 


_ The discussion of the partial likelihood method has assumed that 
lime is measured on a continuous scale and that, as a consequence, two 
events cannot occur at exactly the same time. In practice, time is always 
Measured in discrete units, however small, and many data sets will 
contain “ties”—two or more individuals experiencing events at appar- 
ently the same time. Thus, in the recidivism example where time was 
Measured in weeks, there were several weeks in which two or more 
Persons were arrested. f 

To handle such data, both the model and the partial likelihood 
method must be modified to some degree. The model proposed by Cox 
(1972) for data with ties is just the logit-linear model of equation 3 in 
Chapter I. This model is attractive because, as the discrete-time units get 
Smaller and smaller, it converges to the proportional hazards model 
(Thompson, 1977). 


The method of partial likelihood may be use 


but, if the number of ties is at all large, the computational requirements 
are gargantuan. To avoid this, a number of approximations have been 
Breslow’s (1974). This 


oe the most widely accepted of which is a. 
mula has been incorporated into most programs for partial ll 
tied qos mation including all those mentioned here. pi aa 
conti aa Breslow’s formula reduces to the usual partia ik De 
ane timp data. Thus, such programs can ha ip pre 
Tes nuous- or discrete-time data, and there 1S usually no 

archer to be concerned about the occurrence of ties- 


d to estimate this model 
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This approach is not fool-proof, however. If the number of events 
occurring at most time points is large relative to the number at risk en 
50 percent or more), the Breslow approximation will be poor (Farewel 
and Prentice, 1980). In such situations, it would be better to use the 
discrete-time estimation method described in Chapter 1. 


5. MULTIPLE KINDS OF EVENTS 


In the previous chapters all the even 
though they were exactly alike. Thus, in Chapter 2 we did not distin- 
guish among different kinds of job changes, and in Chapter 3 all arrests 
were treated the same. Often this will not do. In some cases, lumping 
together different kinds of events may be completely inappropriate. 
Even when it is appropriate, however, a more refined analysis in which 
different kinds of events are examined separately is often desirable. 

Fortunately, no new methodology is required. The methods already 


discussed for single kinds of events can also be used with multiple k 
of events; they just have to be a 


Unfortunately, there is still much 
I believe that it is largely due tot 
“multiple kinds of events.” 
different situations, 


ts under study were treated as 


inds 
pplied in more complicated ways. 
confusion surrounding this topic, and 
he fact that there are multiple kinds of 
This phrase really describes several quite 
each requiring a different approach to analysis. 


A Classification of Multiple Kinds of Events 


next chapter. 
The first major class may be described as follows: 
J; 


The occurrence or nonoccurrence of an event is determined by 
one causal process; given that an eve 


c nt occurs, a second causal 
process determines which type occurs, 
It is easy to think ofexam 


ples that fail 
of b 


in this class, Consider the event 
uying a car, and suppose we distin 


guish buying an American car 
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from buyin F 
lead to aaa Tu Paes that distinot 
acar; then, inde noite of events. Rather, one fi beeen et in 
to be an ee ent of that decision, Giedecides arpa tere ws 
explanatory Salan S: foreign car. It is likely that DE 
provided by the pst affect each of these decisions. Saute sien 
z Osteopat hs from i visit to a physician, where we distin at s 
ecision to visit s vasits:to M.D.s. Again it is most ee ate 
What Kinito ee kind of physician is distinct from nes as a 
A E TA It doesn't matter which precedes the ae: 
; For this class ae here is that one decision is distinct from ieee 
for Dulle irom oe kinds of events,” the appropriate ssi 
a process. PE cedars to pe structure of the pared 
ie previous a e event history methods d ibed i 
distinction Se apters to model the occurrence ser i 
a of the even i 
A whom EEN types. Then, looking only at aan eas 
ng the in urred, one uses an appropriate technique for model 
choice would E which determined the type of event. An hens 
: care more TA ai onan multinomial logit analysis if 
situations T broad class of “multiple kinds of ev 
which the following is true: 


at the 
on of 


ents” consists of those 


I. Th 
> ‘ne occurren 
ce of each event type has a different causal structure. 


A diffi 
me erent 
Va Causal s hy 
variables affect se structure means either that different explanatory 
pa Planatory arabi occurrence of each event types or that the same 
vere Rather bases have different coefficients or different functional 
ten eA e providing examples of this general class, let us 
deing the ahd the four subclasses. We shall also postpone 
ribed, od of analysis until all four subclasses have been 
Dae 
- The 
Ae occurri 3 
risk of th ence of one event type removes the individual from 
he other event type. 
s has received much 


” this clas 
classic example 


hers. The 
re are different causal pro- 
d death from cancer. Yet a 
risk of dying of cancer, 
the social sciences. It is 


Oft 
en descri 
sc 
cribed as “competing risks, 


attent; 
š ntion fi 

rom bi PEE 

n biostatisticians and demograp 


Is 
Cath fi 
rome 
ompeti a 
peting causes. Clearly the 


Cess 
Sses lead: 
Pers ading t 
neat Who A death from heart disease an 
Vice versa oe heart disease is no longer at 
. There are also many examples in 
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very likely, for instance, that voluntary job terminations occur as : 
result of different causal processes than involuntary job terminations, A 
only because different decision makers are involved in the two types 0 

termination. Yet once a person quits a job, he or she is no longer at risk 
of being fired. And once fired, a person no longer has the option of 


quitting. A similar example is marital dissolution, where divorce is 
distinguished from death of a spouse. 


IIb. The occurrence of one type of event removes the individual from 
observation of other event types. 


In studies of human migration, one might distinguish moves within a 
country from moves between countries. It would not be uncommon for 
individuals to be lost to further study if they moved out of a country. 
Even though such an individual is no longer observed, he or she would 
still be at risk of a within-country move. 

This example is a asymmetric in that ind 


ividuals who move within a 
Country are still at risk of a between. 


“country move and may continue to 
be observed. It is easy to imagine symmetric cases, however. For exam- 


ple, a study of criminal recidivism may distinguish arrests for violent 
and nonviolent crimes. If the follow-up does not continue past the first 
arrest, then the study would fall into class IIb. 


IIc. The occurrence on one kind of event affects neither the risk of 
occurrence nor the observation of other kinds of events. 


IId. The occurrence of one ki 


nd of event raises or lowers (but not to 
zero) the hazard of the 


other kind of event, 


hazard of quitting a job. Becoming employed reduces the hazard of 
being arrested. 
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Estimation for Multiple Kinds of Events 


We now consider how to deal with these four subclasses. Class IIc is 
easy. If the occurrence of an event has no effect either on the observation 
or risk of occurrence of another event, then the first event can be 
completely ignored in studying the second event. Thus, this class is 
effectively the same as that discussed in the previous chapters. 

On the other hand, if the occurrence of one event raises or lowers the 
hazard of the occurrence of the other event (class IId), then surely the 
first event must be taken into account in studying the second event. 
Again, however, we already have a method for doing this. The trick is to 
use the occurrence of the first kind of event to define a time-varying 
explanatory variable in the analysis of the second kind of event. Thus, in 
the biochemistry example, a time-varying dummy variable indicating a 
change in rank was used to predict the occurrence of an employer 
Change. Similarly, in the analysis of arrests, a dummy variable was 
included to indicate whether or not the person was employed in each 
week. Alternatively, one could create a variable measuring the length of 
time since employment began. In fact, there is no reason why both 
Variables could not be included as explanatory variables. ; 

Class IIb is quite similar to what has already been described as 
censoring: An individual is removed from observation at some point 
Prior to the occurrence of the event of interest. The difference here is that 
Censoring is now an event in its own right. Despite this difference, the 
best available analytic strategy remains the same. Each event type P 
analyzed separately using the models and methods of the tite 
chapter, Events that remove the individual from observation are a 
just as if the individual were censored at that point in time. fo 
in analyzing the causes of within-country migration, individua in ) 
Made between-country moves (and hence could not be followed up 
Would be considered censored at the time of the between-country ae: 

Although I know of no better way to handle this situation, ae ca 
entirely unproblematic. Recall that all the event history ened ier 
Scribed so far require that censoring times be independent © ia : 
times. That requirement still stands even though censoring is ae : 
distinct type of event. Thus, in the migration analysis just aap case 
Must be assumed that times of between-country moves are idepe A 
of times of within-country moves, an assumption that wie a ibat 
Plausible, While it is possible to formulate and estimate pee irical- 
build in some dependence, it is impossible to distinguish them € p 


l 
y from models that specify independence. 
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Class Ia (competing risks), in which the occurrence of an event 
removes the individual from risk of other events, is the one most 
commonly discussed in the event history literature. Accordingly, it is the 
one to which we shall devote the most attention. It is super: 
to class IIb, just considered, and indeed the basic messa; 
Methods for single kinds of events may be applied separately to each 
kind of event. In analyzing each event, the individual is treated as 
censored at the occurrence of any other kind of event. Because of the 
importance of this result, let us spend a little time on the background 
and the argument. Then we can proceed to an example. 


ficially similar 
ge is the same. 


Models for Competing Risks 


There are several Ways to approach the 
but the most common is to be 


“type-specific” (or “cause-specific”) hazard functions, Suppose there are 
m different kinds of events and letj=1,.. 


ve occurred prior to t. 
-specific hazard rate is then defined as 


h(t) = lim P(t, t + s)/s [16] 
s—0 


each kind of event, In 
function (which is maximi 
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Separately for each event type, using the methods described in the 
Previous chapters. 

That does not mean that estimating separate models for each type of 
event is the only way to proceed. In fact the RATE program does 
Simultaneous estimation for multiple kinds of events. The importance vf 
this result is that, at a theoretical level, there is nothing really new about 
models for competing risks. And at a practical level, estimating models 
separately for each kind of event gives one much greater flexibility and 
Control over the kinds of models estimated. For example, one could 
Specify a Weibull regression model for one kind of event and a Gom- 
Pertz regression model for another kind of event. Or more likely, the 
Models for different kinds of events might have different explanatory 
Variables or the same explanatory variables transformed in different 
ways, Most importantly, one can ignore event types that are of little or 
no interest. In a study of marital dissolution, for example, if one is only 
interested in the causes of divorce, it is not necessary to estimate models 
for both divorce and death of a spouse. 


An Empirical Example of Competing Risks 


Foran example of a competing risks analysis, we again consider data 
vow, study of criminal recidivism. Known as TARP (Rossi et al., 
1980), the study was a large-scale replication of the experiment des- 
ribed and analyzed in Chapter 3. Approximately 4000 inmates released 
from Prisons in Texas and Georgia were randomly assigned to experi- 
mental treatments involving various levels of financial aid and job 
rte assistance. They were followed for one year after release, when a 
Tach was made of public records for any arrests which occurred during 
Sy follow-up year. For this example, the analysis is restricted to on 
inten ete convicts interviewed during the follow-up es ee es 
arr S IS the first arrest to occur after release, SO that 

vos during the one-year period are treated as censored. ana 
of oe types of arrests will be distinguished, agin apne nl 
TER allegedly committed. In one phase of the oe are sae 
On) fi uish crimes against properly (robbery, burglary, r sen 

rom all others. In a later phase, we shall further subdivide 


ert . 
c ar S : Luin 
Y crimes into violent and nonviolent crimes. 


ob cach inuous-time methods are most appropriate 
tee arrest is known. Using the SAS PHGLM 
ate proportional hazard models of the form 


ince the exact day 


: hall 


procedure, wes 


17] 
log h(t) = a(t) + bixi + Bex? * -+> [ 
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where the j subscripts indicate that there is a different set of vg ed 
and a different arbitrary function of time for each arrest type. Exp e 
tory variables include education, marital status at time of release, age : 
release, age at earliest known arrest, sex, number of previous convic 
tions for crimes against persons, number of previous convictions for 
crimes against property, a dummy variable indicating whether the 
incarceration was for acrime against persons, a dummy variable indicat- 
ing whether the incarceration was for a property crime, and a dummy 
variable indicating whether or not the person was released on parole. 
Tha analysis is simplified by the fact that none of these is a time-varying 
explanatory variable. 

We begin by estimating a model which does not distinguish different 
kinds of arrests. There were 340 persons with at least one arrest, so the 
remaining 615 persons with no arrests were censored at 365 days. The 
estimated coefficients are given in Table 4, column 1, Only four of the 11 
explanatory variables are significant at the .05 level or beyond: age at 
initial arrest, number of previous property convictions, incarceration 
for a property offense, and release on parole. Note that the dummy 
variable for financial aid does not havea significant effect, and even has 


a sign opposite that which was expected. Thus it appears that financial 
aid is not effective in reducing recidivism. 


This initial model is unsatisfactory, 
that financial aid should reduce p 
offenses. It is also possible that other v. 
on different kinds of arrests. To exa 
arrests into 192 property arrests and 
estimate a separate 
When estimating t 


however, because theory suggests 
roperty offenses but not other 
ariables may have different effects 
mine this possibility, we subdivide 
148 nonproperty arrests, and then 
proportional hazards model for each type of arrest. 


he model for property arrests, persons whose first 
arrest was fora nonproperty arrest are treated as censored at the time of 
that arrest. Similarly, in the model for nonproperty arrests, property 
arrests are equivalent to censoring. 


are different from first arrests. 
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TABLE 4 
Estimates of Proportional Hazards Models for Different Arrest Types 

7 : : 3 4 a 
Explanatory Variables All Arrests Property Nonproperty Violent Other 
Education —.022 —.006 a a T 
Financial aid (D)? 108 215 -.050 eae ai 
Imprisoned for crime 

against person (D) 080 062 O87 a Te 
Imprisoned for crime 

against property (D) 449** 889** —.005 300 se 
Number of convictions 

for crimes against 

persons 124 —.089 =e se 
Number of convictions 

for crimes against 

Property 2399 -242** pes ond oa 
Paroled (D) 273” 167 414* 173 .620* 
Male (D) 21 203 322 214 427 
Age at earliest arrest -.043** ~.051** ka oy, Tis 
Married (D) .053 —.036 167 ae: 
Age at release —.009 =:010 _ a < 
N of arrests 340 192 148 g 55 

955 355 888 


al 955 955 


a. indi 

Ra indicates dummy variable. 

significant at .05 level. 
Significant at .01 level. 


of arrests are shown 


odels for the two kinds ; 
of age at initial arrest and 


Results from estimating m 


in columns 2 and 3 of Table 4. The effects l ‘ 3 
number of previous property convictions are approximately the sam 


tos each type of arrest. On the other hand, release on parole has a 
Significant effect on nonproperty arrests but no effect on property 
arrests. Moreover, previous incarceration for a property crime substan- 
tially increases the risk of being arrested for a property crime but not for 
a nonproperty crime. Financial aid has n° significant effect on either 
type of arrest. ; 

Although we could stop the analysis at this point, fu tis 
Obtained by subdividing the nonproperty arrests into two ie 
Violent crimes against persons and all other offenses. (This rest 
Category consists, for the most part, of such relatively minor a 7 
Possession of tiar juana; carrying a concealed weapon, and “neglec' - 
amily.”) A separate model is then estimated for each type of Ee 
results shown in columns 4 and 5 of Table 4. Note that only one variable 


rther insight is 
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has a significant effect for each arrest type. This is a consequence of the 
fact that overall significance levels will go down as the number of events 
becomes a smaller proportion of the total sample size. What is impor- 
tant here is that the effect of release on parole is large and significant for 
the “other” category but small and nonsignificant for violent crimes 
against persons. Similarly, while the number of previous property con- 
victions has a significant effect on the hazard for violent offenses, it has 
no effect on other types of nonproperty offenses. 

We see, then, that distinguishing different kinds of events can lead to 
different conclusions about the effects of explanatory variables. Sim- 
ilarly, the failure to distinguish among event types may produce mislead- 
ing results. For example, in the model for all kinds of arrests, we found 
that those released on parole had a higher hazard for being arrested. 
Nevertheless, when we focused on specific types of events, the effect of 
parole status was significant only for relatively minor offenses. 


Dependence Among Different Kinds of Events 


The approach we have just described and applied for class Ila, 
competing risks, has one very important property. The type-specific 
hazard functions are defined in sucha way that itis unnecessary to make 
any assumptions about dependence or independence among different 
kinds of events. To understand what this means, consider again the 
example of job terminations where we distinguish voluntary termina- 
tions from involuntary terminations. The two types would be dependent 
if, for example, persons who knew they were likely to be fired were more 
likely to quit, possibly to avoid the stigma of being fired. Or it could 
work the other way. Persons who wanted to quit might arrange to be 
fired in order to collect unemployment insurance. Neither case poses 
any problems for the kind of model which we have just considered. 

While this is an attractive feature, it can be argued that the model 
solves the problem of dependence versus independence by simply defin- 


ing it away. In fact, there are other approaches to competing risks in 
which the problem of dependence is crucial and, 


insoluble. It can be shown, for example, that it is im 
guish empirically models in which different kinds of events are depen- 
dent from a model in which they are independent (Tsiatis, 1975), While 
the mathematics of these different approaches is thoroughly developed, 


the interpretation and implications for empirical research are still con- 
troversial. (For a detailed discussion, see Kalbfleisch and Prentice, 1980: 
Ch. X) 


in some respects, 
possible to distin- 
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6. REPEATED EVENTS 


Most e ; 5 : 
yut e r by social scientists are repeatable, and most 
ples include job chan a we lise en doses Beni 
tions, and visits to a k p =— marriages, divorces, arrests, convic- 
On event history a en Unfostunatiy), senor literature 
cnilyea teint B is that has come out of biostatistics contains 
Santner, and Brown 198 xa me aig of repeated events (¢.g., Gail, 
alrendynoted, thst i, Prentice, Williams, and Peterson, 1981). As 
iitérest.n Neca oe of the fact that the events of greatest 
Stacia. ated roe a j research are deaths. While sociologists (Tuma, 
1982a, 1982b) —— 1979) and economists (Flinn and Heckman, 
TO iid Sar setae e made some progress 1n this area, there is still much to 
events. way of developing methods that are suitable for repeated 

One a r ; 
dink teruce Mat ji pomine appropriate is to conduct a separate 
dišgissed. T ae event using any of the methods already 
a Shaded Tor aa y of marital fertility, for example, one could estimate 
model for the oh between marriage and first birth, a second 
approach pee ce al Betyeen first and second birth, and so on. This 
expects Mhennod ee special assumptions, and is especially useful if one 
hand, if the e to differ from one event to another. On the other 
doing K PAR aA is essentially the same across successive events, 
inefficient. arate analysis for each event is both tedious and statistically 


A . 
Simple Approach 


In thi 

P 1s cha £ , 

difficulties a We shall focus on a second approach that avoids these 
y treating each interval between events for each individual 


as a separa 
spells) ere th observation. These intervals (sometimes referred to as 
Methods des ~ pooled over all individuals. At this point any of the 
this ee in the previous chapters can be applied. Although 
cory, we she notentirely satisfactory from the Vi 
all postpone a discussion of those problems until late 


this 
MS chapte 
r. There are also a number of possible complications not 


IScuss, ` 
To m In earlier chapters. 
simplify : : 
of a Gene, the discussion, let us assume t 
nare (In the next chapter we $ 
egin by ext both multiple kinds of events and rep 
ending the empirical example discussed in 


ewpoint of statistical 
rin 


hat the repeated events are 


hall consider models that 
eated events.) We 
Chapter 5. Recall 
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TABLE 5 
Frequency Distribution for Arrests 
Number of Arrests Number of Persons 
0 622 
1 213 
2 85 
3 25 
4 9 
5 5 
6 2 


that the sample consisted of approximately 1000 inmates released from 


Georgia state prisons who were observed for one year after their release. 
In the previous analysis, the event of interest was the first arrest that 
occurred after release. As Table 5 shows, however, many of the subjects 
were arrested more than once during the one-year follow-up period, and 
to ignore the later arrests seems wasteful of information, It also raises 
the question of whether the causal process differed for earlier and later 
arrests. 

To incorporate these additional arrests into an event history analysis, 
let us divide each individual’s one-year follow-up period into intervals, 


using the observed arrests as the dividing points, Consider, for example, 
a person with two arrests that occurred at times marked by Xs on the line 
below: 
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TABLE 6 
Estimates of Proportional Hazards Models for Repeated Arrests 
Second or 
All Arrests Later Arrests 
g 1 2 3 
Explanatory Variables b t b t b £ 
Education —.008 —.35 —.010 45 020 52 
Financial aid (D)* 150 1.69 .136 «1.54 142 95 
Imprisoned for crime 
against person (D) 173 1.25 153 LI 207 92 


Imprisoned for crime 
against property (D) 


Number of convictions for 
crimes against persons .051 10 -.002 00 162 fe 
Number of convictions for 
crimes against property as 3.27% ass 2.13" 061 66 
Paroled (D) Say 36s «9312 3a" aan l 75 
Male (D) 054.26 076037 elm M 
Age at earliest arrest cga aoe 038 Se -0 1A 
Married (D) 160 1.50 151 1.42 .293 LA 
Age at release Soos i17 008114 00 3 
Number of prior intervals £ y 2e oh a1 
Time since release - 0172.05" =00 10 
N or arrests 549 a a 
1492 1492 eae 


a. (D) indicates dummy 
{Significa t at .05 level. 
Significant at .01 level. 


percent. It is reasonable to expect, then, that the new estimates x 
have smaller standard errors and, hence, larger t-statistics. artes 


although the basic pattern of results is the same, we do find somewhat 
larger t-statistics when all the arrests are included. In fact, theypositivs 
effect of financial aid is now marginally significant by aquest 


Problems with Repeated Events 
escribed is straightforward and intuitively 
ber of assumptions that may 
that the dependence of the 
r each successive 


While the method just d 
appealing, it requires that one make a num 
well be problematic. First, one must assume 
hazard on time since last event has the same form fo 
event. Recall that in a proportional hazards model, 


log h(t) = a(t) + bixi + box2+.-- [18] 
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where t is the length of time since the last event and a(t) is an unspecified 
function of time. Even though it is unspecified, a(t) must be the same 
function for the first arrest, the second arrest, and so on. Or if one 
assumes that intervals have a Weibull distribution, that distribution 
must have the same shape parameter for each successive interval. If 
there is reason to be suspicious of this assumption, the proportional 
hazards model can be modified to make it unnecessary. The basic idea is 
to let the function a(t) be different for each successive interval, while 
forcing the b coefficients to be the same. Such a model can be readily 
estimated using the method of stratification discussed in Chapter 4. 

A second assumption implicit in this method is that, for each individ- 
ual, the multiple intervals must be Statistically independent. In general, 
we would expect that people who are frequently arrested (i.e., have short 
intervals) will continue to be frequently arrested. This does not violate 
the assumption of independence, so long as that dependence is fully 
accounted for by the explanatory variables included in the model. 

In most cases, however, there will be good reason to think that the 
independence assumption is false, at least to some degree. The conse- 
quences of violating this assumption have not been studied, but analo- 
gies with linear regression Suggest that (a) the coefficient estimates will 
still be asymptotically unbiased and (b) standard error estimates will be 
biased downward. Work is now being done on ways to relax the inde- 
pendence assumption by introducing arandom disturbance term that is 
correlated across intervals (Flinn and Heckman, 1982a, 1982b). This 


approach has not progressed to the point where the new methods can be 
generally recommended, however, 


In the meantime, there are Some things that can be done to minimize 
the consequences of Violati i 


the length of the previ 


t ous interval, 
is observed. 


PE adjus wnward by 
multiplying each one by the squa /N. The rationale for this 
adjustment is that, if intervals are hj 
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vals for a single individual are redundant—an individual with many 
intervals is not contributing much more information than an individual 
with just one interval. The approach is highly conservative, however, 
and probably results in overestimates of the true standard errors. 

A third limitation of models for repeated events considered thus far is 
that the hazard rate is expressed as a function of the time since the last 
event. While this is by far the most common specification, there are 
often situations in which it is more plausible to let the hazard vary as a 
function of age or time since some common starting point. In studies of 
fertility, for example, age may have stronger effects on the hazard for a 
birth than length of time since the previous birth. We have discussed the 
problem of starting points in Chapter 4, but some additional remarks 
are in order here. First, the question of origin of the time scale is always 
ambiguous in the case of repeated events and should always be given 
careful consideration. Second, models for repeated events in which the 
hazard depends on time since some fixed starting point may be inconve- 
nient to estimate. Consider, for example, the proportional hazards 
model in equation 18 where we now consider t to be time since release 
from prison rather than time since the last event. Such a model can, in 
principle, be estimated by the partial likelihood method, but the com- 
monly available partial likelihood programs do not have that capabil- 
ity.” See Prentice, Williams, and Peterson (1981) for further details. 
Perhaps the best approach, at present, is to include age or time mt 
some other point as an explanatory variable in models that allow the 


hazard to vary with time since the last event. 


Extending the Recidivism Example 
Pacis è ome of 
Let us now return to the recidivism example to RO fi 
the new possibilities just discussed. In panel 2 of Table 6, esti 


h TA 5 2 h of 
given for a model that includes number of prior pple A 
time from release to the beginning of each meae ie number of 
variables are statistically significant. The positive eflec 


i i ny arrests have a 
rior ¢ indi - expected, that those with many est 
Prior arrests indicates, as expec AA NÈ tie positive effect 


higher hazard for arrest at subsequent po ard to increase over 
of time since release indicates a tendency for the hazar g ai sames 
the one-year observation period. Theinelusion “eee ek variables in 
What attenuates the coefficients and t-statistics for mae s ARAS for the 
the model. In fact, the t-statistics are about the eee is to be 
model which only examined the first arrest. Some a tion is likely to 
expected since the violation of the independence assumption 1s y 


lead to inflated t-statistics. 


56 


When corrections are introduced for possible violations of poet 
tions, it appears that not much has been gained by analyzing z - nf 
rather than just the first arrest. This is surprising since the inclusio si 
the additional arrests ought to have yielded diminished standard T 
and, hence, increased t-statistics. One possible explanation is that = 
causal process may be somewhat different for arrests after the first. 
examine this possibility, the second model was reestimated after exclu ; 
ing all the intervals from release to first arrest. This left 531 intervals o 
which 231 ended in arrest. Results are shown in panel 3 of Table 6. With 
the exception of parole Status, none of the explanatory variables even 
approaches statistical significance. It is reasonable to expect some 
decline in significance level since the effective sample size has been 

reduced greatly. Nevertheless, the coefficients of the formerly significant 
variables are also greatly attenuated, Suggesting that there has been a 


real decline in the effects of these variables for later arrests. We shall not 
speculate on the reasons for this decline. 


Left Censoring 


Before leaving the topic of repeated events, let us consider one further 
problem that is quite 


common but did not occur in this particular 
example. The problem is often referred to as “left censoring,” but it 1s 
worth noting that biostatisticians mean something quite different when 
they use this term,® 

Suppose that a sam 
and the event of inter 
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Mie a of left censoring depend on the model being esti- 
ee pote ee specifies a hazard rate that does not depend on time 
whatever on made te ME gontinuons case), there is no problem 
the bedin hatte the initial censored interval as if it began at 
difficulty if ao +s e observation period. Similarly, there is no inherent 
dicta panier =e on age, except for the computa- 
i acy on the time since the la 
oe the time since the last event is not known. Treating the 
tfoubtedi it began at the beginning of the observation period will 
anus seth feos some bias. Flinn and Heckman (1982a, 1982b) 
tionally abn possible solutions to this problem, but all are computa- 
ereda ersome and depend on somewhat arbitrary assumptions. 
While this ppro is simply to discard the initially censored intervals. 
bises epresents a loss of information, it should not lead to any 


st event poses serious problems, 


7. CHANGE OF STATES 


In this 
n this chapter we shall consider a class of models that allows for both 


multi A i ‘ 
tiple kinds of events and repeated events. This class includes all the 


Previo : à 5 x 
usly discussed continuous-time models as special cases. Known as 
ese models have been pre- 


ia or semi-Markov models, th 
feta Koei in the social science literature by Coleman (1981), 
Men annan, and Groeneveld (1979), and others. : 
Ten ov models describe processes In which individuals are in one of 
For ae exclusive and exhaustive states at any point In time. 
uma regs in applying their model to the study of marital status, 
atttited (1979) distinguished three states: married, unmarried, and 
governm HOKE (1982) used aversion of the model to study forms of city 
(1981) Sats commission, council-manager, mayor-council. Diprete 
States js udied employment versus unemployment. The set of possible 
i le often called the state space. For the model to be of any interest, 
States. BE Possible to make transitions between at least some of the 
time. » and it is assumed that those transitions can occur at any point in 
ee state transitions are equivalent to 
thought i In fact, any kind of even 
be a Pike as a transition between states, althou, 
artificial. For example, an arrest can be t 


the “events” discussed in the 
t considered thus far can be 
gh this may sometimes 
hought of as a 
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transition from having had, say, four arrests to having had five = 
Similarly, the birth of a child is a move from n children to n + | children. 


Transition Rates 


served 

Transitions among these states are controlled by a set of ee 
transition rates, which are defined as follows. Let P,(t, t + s) be 

probability that an individual who is in state iattimet 


will be in state j at 
time t + s. Then the transition rates, denoted by r,(t), 


are given by 


r(t) = lim Pi(t, t+ s)/s [19] 
s—-0 


Notice the Similarity between e 
type-specific hazard func 
hazard functions are just 
individuals begin in th 
transition rates can be 
which events are disting 

The transition rates, 
explanatory variables, 


quation 19 and the definition of the 
tions in equation 16. In fact, the type-specific 
transition rates for the special case in which all 
e same origin state, To put it another ways 
regarded as type-specific hazard functions in 
uished both by origin and destination states. 

in turn, are allowed to depend on time and the 
with the most common functional form being 


log r(t) = ay(t - t’) + by x [20] 


i 
is the time of the last transition, and a(t- t’) 


t’) is often constrained 
e between transitio 
el are the 
Pendent and 


ich 
to be a constant ay, jer 
ns is exponentially distributed. 
assumptions that time intervals between 


events are inde » for each ij combination, identically 


distributed, 


This model, or variations thereof, can be estimated with a variety of 
forms of data (Col 1981). Neverthele 


l Ss, the best form of data is 
clearly an event history in which both the 


ow how to estimate 
equation 20 using a standard program for estimating proportional 
hazards models. 
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Analysis may be accomplished through the following steps: 


(1) Break each individual’s event history into a set of intervals be- 
tween events. 

(2) Separate these intervals into groups according to origin state. 

(3) For each origin state, estimate models for multiple kinds of 
events, specifically for class Ila (competing risks). Each destina- 
tion state corresponds to a different event type. Asin Chapter 5, a 
separate model may be estimated for each destination state, 
treating all other destinations as censored at the end of the 


interval. 


An Analysis of Job Changes 
analyze data on job changes of 
detailed description of the data 
b histories of these physicists 
en of Science (Cattel Press, 
ach physicist from the 
ey in 1966. Since the 
ever, the length of the 


To illustrate this procedure, we shall 
477 physicists surveyed in 1966. (For a 
see Hagstrom, 1974.) Information on jo 
Was obtained from American Men and Wom 
1966). Data were available on each job held by € 
receipt of the doctorate to the time of the surv 
Physicists were of widely varying ages in 1966, how 
Job histories also varied widely across individuals. 

For this analysis, jobs were classified by three types of employers: (1) 
University departments whose “quality” was rated in Cartter’s (1966) 
study; (2) four-year colleges and universities not rated by Cartter (usu- 
ally lesser known institutions); and (3) nonacademic employers, includ- 
ing government and industry. These three employer types are the three 
States in a semi-Markov model of employer changes. The objective !s to 
estimate the effects of several explanatory variables on transitions 
among these three states. ; F 

This situation departs somewhat from the typical semi-Markov 
Model in that it is possible to make a transition from any of the three 
States back into the same state. Thus, one can make a transition from a 
nonacademic employer to another nonacademic employer. While this 
Creates no analytical difficulties, it does increase the number of kinds of 
transition. With three origin states and three destination states, there are 
Nine possible transitions,and aseparate model will be estimated for each 


One of them. 

The 477 physicists held a t 
Observation. Of these jobs, 477 We 
Progress when the study was terminate 
in transitions from one employer to another. 


otal of 1069 jobs during the period of 
re censored because they were still in 
d. The remaining 592 jobs ended 
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The explanatory variables were all irpated as constant over a 
They include a quality rating of each physicist’s undergraduate 7 ai 
tion, Cartter’s (1966) rating of the doctoral department, number o r ai 
between the bachelors and doctoral degrees, a dummy variable m iea : 
ing whether or not the individual received a postdoctoral fellows ae 
dummy variable indicating U.S. citizenship, a dummy variable 
“inbreeding” (employer is the physicist’s doctoral department), air 
of previous jobs, Cartter rating of the current department (when availa- 
ble), career age (time since Ph.D.), and the calendar year in which the 
job began. on fe 

The model to be estimated is given by equation 20, where a,(t - t’) is 
left as an unspecified function of ti 
hazards models, each of which ca 
hood method. The first step is to 

by the three origin states. This 
‘unrated academic jobs, and 206 

Beginning with the rated acad 
one for each of the three desti 
academic jobs, any such transiti 
and all other outcomes are trea 
the second model, any change 
an event and all other outco 
procedure is used for the tra: 
demic jobs. For all three mod 
same duration time is specifi 
models is the specification 


me. Hence, we have nine proportional 
n be estimated using the partial likeli- 
divide the 1069 jobs into three groups 
yields 651 rated academic jobs, 212 
nonacademic jobs. 

emic jobs, three models are estimated, 
nation states. For transitions to rated 
on (there were 202) is treated as an event 
ted as censored, Similarly, in estimating 
to an unrated academic job is treated as 
mes are treated as censored. The same 
nsitions from rated academic to nonaca- 
els, the same subsample is analyzed and the 
ed for each individual. The difference in the 


B shows how this can easily be set up 
using the SAS program PHGLM, 


TABLE 7 
Estimates of Proportional Hazards Models for Transitions Among Three Types of Jobs 


1 2 3 
From Rated Academic to: From Unrated Academic to: From Nonacademic to: 
Rated Unrated Rated Unrated Rated Unrated 
Explanatory Variables Academic Academic Nonacademic Academic Academic Nonacademic Academic Academic Nonacademic 
Undergraduate rating -009 010 O13 .035 014 049 -003 .032 —.018 
Graduate department rating 123 ~.011 —.228 ~.039 ~.341 —.433 121 —.269 224 
Time for Ph.D. -045 037 ~.017 —.106 041 —.025 —.026 AST —.106 
Postdoctoral Fellow (D)® -338* -009 .234 467 592 415 —0.15 AST 058 
U.S. Citizen (D) -138 -.340 P tgl —.222 ~.370 269 —.293 153 —.468 
Inbred (D) -210 .538 -146* —.149 -.484 150 a = iad 
Number of previous jobs -2537+ 212 .284* 338* 146 AMS Da” ~.038 .248 
Year job began ORI 025 006 041** -002 .036 .045** -001 —.014 
Career age -204*** ~.161*** —.163*** —.147*** —123*** —.160*** —.089*** —.105* —.075** 
Cartter rating -O1S -141 —.193 -= - - = 
N of cases 651 651 651 212 212 212 206 206 206 
N of job changes 202 42 48 55 43 20 
2 X log-likelihood 1934.97 


109 25 48 

426.54 486.57 436.52 336.24 148.97 960.77 218.82 413.41 
ae a a a  ees 
a. (D) ates dummy variable. 
*Significant at .05 level. 
** Significant at .01 level. 


***Significant at .001 level. 


a 
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i ore likely to change jobs again in the immediate future. qne 
wee or ii of -.204 exponentiates to .82, indicating that each 
caged year past the doctorate decreases the hazard of a marha ny 
18 percent; over a five-year period the hazard would bE re a A 
percent. The latter figure is obtained by calculatie; aS 
exp(5 (-.204])]. There are many possible explanations for this e i 

all not pursue them here. ; 

oe cis variables have significant effects in the next two columns 
representing transitions to unrated academic and nonacademic m 
Although this is partly due to a decline in magnitude of some of t : 
estimated effects, it is largely a consequence of the reduced number 0 
observed transitions of these two types— there are only 42 changes from 
rated academic to unrated academic, and only 48 changes from rated 
academic to nonacademic, It is generally true in event history analysis 
that, while censored observations do contribute some information to the 
analysis, they do not add nearly as much information as uncensored 
observations. As a result, significance levels and standard errors will 
depend as much on the actual number of events as on the total number 
of observations, 

The effect of career age is strong 
all three types of transitions. The co 
is also roughly the same magnitud 
statistical significance Varies great] 


and roughly the same magnitude tor 
efficient for number of previous jobs 
e for all three types, but its level of 

y. The calendar year in which the job 
began has a significant effect only for rated to rated transitions, but its 
numerical magnitude is actually larger for rated to unrated moves. One 
Surprising result is the Substantial coefficient of “inbreeding” for rated 
academic to nonacademic transitions. When exponentiated, the coeffi- 
cient of 746 becomes 2.1 l, indicating that physicists whose current job 
is in the university department in which they got their doctorate are 


More than twice as likely as others to move to a nonacademic job. 
The next Phase of the an 


alysis is to examine the 212 jobs in unrated 

academic settings, Again the procedure is to estimate three models for 
this subsample. In each model, the event is a transition to a particular 
destination State and all other Outcomes are treated as censored data. 
Carter rating could not be included as an explanatory variable because» 
by definition of the orio: it was not measured for these jobs- 
ree middle columns of Table 7. The overall 
that for transitions out of rated academic 
Wer Variables have 
sp 


effects that are sene 
artly due to the fact that both the overa 
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number of cases and the number of observed transitions of each type are 
substantially reduced. 

The final phase of the analysis is to repeat the process one more time 
for transitions out of nonacademic jobs. Both Cartter rating and 
“inbreeding” had to be omitted as explanatory variables because they 
were not meaningful for nonacademic jobs. Results in Table 7 are again 
quite similar to those for the other origin states. 

What does this analysis tell us? The negative effect of career age is 
quite consistent across all the transition types. To a lesser degree, so are 
the effects of the number of previous jobs. The year the job began seems 
to be important in predicting transitions to rated academic jobs, but not 
to other destinations. The effects of postdoctoral fellowship and 
inbreeding show up for two types of transition but not for any others. 


Simplifying the Model 

Given the similarities across transition types, it is reasonable to 
inquire whether the results could be simplified in some way. In fact, 
there is no necessary reason why one should estimate separate models 
for all nine transition types. Both theory and empirical evidence can 
Suggest combining either origin states or destination states or both. 
Table 8 gives results from estimating proportional hazards models in 
which the three destination states are combined while the distinction 
among origin states is maintained. The estimation procedure for these 
models was quite straightforward. As with the models in Table 7, 
intervals between events were subdivided by the three origin job types, 
and a separate model was estimated foreach origin type. No distinction 
Was made amo estination states, however. 

The Cee ae in Table 8 are approximately whet one 
Would expect just from av eraging coefficients in Table di ane : o 
notable change, however. For unrated academic jobs, the > ec e 
postdoctoral fellowship is statistically significant in Table 8 it not i i 
Table 7. This is undoubtedly a consequence of the greater statistica 
power obtained by combining destination job DPS. cena 

It is also possible to test whether inej simplii r ott a 
Combining destination job types Is statistically justi a ; M 
hypothesis is that the explanatory variables have identica <8 ere 
across destination types but may differ by origin type- For a sal 
estimated, both Tables 7 and 8 report the log-likelihood (multip 5 a 

2). A test statistic is obtained by adding all the log-likelihoods inTa 


es i 
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TABLE 8 
Estimates of Proportional Hazards Models 
Combining Destination Job Types 


o OOO ee 


na P = ademic 
Explanatory Variables Rated Academic Unrated Academic Nonac 


Undergraduate rating .006 -030 -001 
Graduate department rating —.128 —.189 -081 
Time for Ph.D. —.028 —.017 —.017 
Postdoctoral fellow (D)# .271* 492* —.036 
U. S. citizen (D) ~.175 —.181 ~.274 
Inbred (D) 349* —.494 i A 
Number of previous jobs .247*** .234* -213 b 
Year job began .019** .027** 020 
Career age —.188*+*+* ~.14] ##* .086* 
Career rating —.044 = - 
N of cases 651 212 206 
N of job changes 292 118 182 
~2 X log-likelihood 2859.5 937.4 1614.8 
cates dummy variable, 
tat .05 level. 
**Significant at .01 level. 
***Significant at .001 level, 
7, doing the same in Table 8, and then taking the difference between the 
two sums. Thus, we have 


48.8 = 5411.7 - 5362.9 


» We may conclude that Table 8 is 
cation of Table 7 


od (multiplied by -2) was 
ikelihood and the sum 0 
are value of 1212.9 with 
istically unacceptable tO 


e between this log-I 
e 8 yields a chi-squ 


m. Clearly, it is stat 
collapse origin states. 
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One m 

e a Ea a that the approaches taken in this chapter, 
aoee E sed on observation of repeated events on each 
chanter, Suatiiniy i an important limitation discussed in the last 
e a ie p a be assumed that whatever dependence 
neem datos a e intervals for a single individual is a conse- 
A pene set t e explanatory variables that are included in the 
er del en i consequences of violating this assumption are not 
dissed ee Cha od, one can always take the conservative approach 
srt pter 6 of modifying the standard errors so that they 

number of individuals rather than the number of intervals or 


t 
he number of transitions. 


8. CONCLUSION 


This o 
is monograph has focused on models in which the probability or 


haza 
re te eos of events depends on one or more explanatory 
ing that ail a = ion that closely resembles multiple regression. Assum- 
models can ers explanatory variables have been included, these 
tory variables y be given a causal interpretation because the explana- 
_ Many Pi ae the event in time. 
lished befo atistical texts on survival 
Methods ie 1980, pul considerable emphasis s 
on Sesan Te tie aim is to describe the distribution of event times, or 
tions of i methods, where the aim 1s to compare two distribu- 
has fender : times. The development of effective regression methods 
however A much of this material of limited value to social scientists, 
. On one hand, the shape of the distri 


analysis, especially those pub- 
hasis on “single-sample” 


ing unless o bution can be very mislead- 
the =F re a E for sources of heterogeneity 1n the sample. On 
Performed nd, statistical tests for comparing two distributions can be 
dummy ex ee effectively within the regression framework by using 4 
Among iene variable. 

associated ne A methods, the pa 

Oth contin odels is clearly the most appeal I ; l 
Much less uous and discrete-time data with a single algorithm. It is 
Methods, C HESELICHIVG than some of the more common parametric 
Standard onvement and efficient programs are now widely available in 
Varying e statistical packages. And it can readily incorporate time- 
tational Pap anatory variables (although with greatly increased compu- 
Choice in ost). For all these reasons, partial likelihood is & natural first 

most situations. 


od method withits 


rtial likeliho 
ach. It can handle 


aling appro 
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There are two important limitations to the partial likelihood method, 
however. First, the dependence of the hazard on time is treated as a 
nuisance function that cancels out of the estimating equations. If the 
nature of this dependence is of interest in itself, it may be necessary to 
shift to a parametric model. Although some partial likelihood programs 
(e.g., BMDP2L) enable one to recover graphical estimates of the 
dependence on time, these are often inadequate to draw firm inferences. 

The second limitation is that the proportional hazards model asso- 
ciated with the partial likelihood method does not include a disturbance 
term representing unobserved heterogeneity. In fact, Tuma (1982) has 
shown that partial likelihood will not accomodate any such disturbance 
term. When only unrepeated events are studied, it is not clear that this 
limitation is of any serious consequence. For repeated events, on the 
other hand, the inclusion of a disturbance term allows one to model 
dependence among repeated events fora single individual. Although the 
implications of such dependence have not been studied extensively, it 
would certainly be desirable to take it into account. Thus, those studying 
repeated events may want to consider some of the newer methods now 
being developed (Flinn and Heckman, 1982a, 1982b; Heckman and 
Singer, 1982). 

One final note. The methods described in this monograph are practi- 
cal, state-of-the-art approaches to the analysis of event histories. Never- 
theless, one must keep in mind that event history analysis (by whatever 
name) is a rapidly expanding field to which a large number of people are 
contributing, It would be Surprising indeed if many important new 
developments did not appear over the next several years. Since much of 


this litera is quite i 
es literature ts quite technical, the social scientist who wants to stay 
abreast of such developments ma 


ae have to s i those who 
are actively involved in the field. y seek the advice of 


APPENDIX A: 
MAXIMUM LIKELIHOOD AND PARTIAL LIKELIHOOD 


for those who have some acquaintance with maxi- 


This appendix is 
stribution theory. 


mum likelihood estimation and continuous di 


Maximum Likelihood 


The principle of maximum likelihood is to choose as parameter 


estimates those values that maximize the likelihood (probability) of 
observing the data that have actually been observed. The first step in 
doing this is to express the likelihood of the data as a function of the 
unknown parameters. We shall see how this may be accomplished for 
Parametric regression models. 

Let us suppose that we have a sample of nindependent individuals (i= 
l, ...4n). For each individual, the data consist of (ti, di, xi) where ti is 
either the time of event occurrence or the time of censoring, di is a 
dummy variable with a value of | iftis uncensored or O if ti is censored, 
and x; is a vector of explanatory variables. If observations are independ- 
ent, the likelihood of the entire sample is just the product of the likeli- 


hoods for individual observations; that is, 


ieii [Al] 

fer t 
For uncensored observations, L: = f(t.) where fi is the probability density 
i. Note that fi is subscripted to denote that 


function (p.d.f.) for individual : 
the p.d.f. depends on the vector of explanatory variables, and thus 


differs across individuals. For censored data Li= 1- Fi(ts) where F; is the 
Cumulative distribution function (c.d.f.). Thus, | - F(ti) is the probabil- 
ity that an event occurs after ti for individual i. We can combine these 


formulas into 
(1-d;) [A2] 


L= fh aD a - Fx) 
i=l 
67 


68 


Oo pro er, we i ionships 
ed furthi must make use of the following relati 

proce pe: y 

between the hazard function, the p.d.f., and the c.d.f. 


[A3] 
h(t) = £()/(1 - F) 
j [A4] 
F(t)=1 -exp -f h(u)du 
0 


iti sity, 

Equation A3 expresses the fact that the hazard isa a AA ae 
that is, it is the density for an event occurring at time t give in 
event has not already occurred. Equation A4 is obtained a, hee 
first-order linear differential equation that arises from A3 (Ka 

ice, 1980: 6). s i . a 
Se these a equations into the preceding equation gives a 
expression for the likelihood in terms of the hazard function: 


n k ʻi 5] 
L= A h(t)! exp Cf nia) [A 


0 


Since maximizing the lo 


ERE ‘ aximiz- 
garithm of a function is equivalent to ma 
ing the function itself, i 


t is convenient to use the log-likelihood: 


n n gi [Ao] 
log L = 2 d; log h(t) - ey f h,(u)du 
is iz 


0 
At this point one can substitute for hi 
has been chosen, For example, in the case 


model with a single explanatory variable, 
This leads to 


i del 
whichever parametric T oi 
of the exponentiai eae i 

i) 
we have h(t) = exp(a + 


n n a7] 
log L= 2 d;(a + bx;) - 2 t; exp(a + bx;) 
i= | 
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We have now succeeded in expressing the likelihood as a function of the 
unknown parameters, in this case a and b. The second step is to use some 
numerical method (usually iterative) to maximize log L with respect toa 
and b. The Newton-Raphson algorithm is usually satisfactory for this 
purpose, and has the advantage of producing standard errors of the 
estimates as a simple by-product. For further details, see Kalbfleisch 


and Prentice (1980) or Lawless (1982). 


Partial Likelihood 


Partial likelihood is like maximum likelihood in that the first step is 
to construct a likelihood function that depends on the unknown 
parameters and the observed data. The second step is to find parameter 
values that maximize this function. However, the usual likelihood func- 
tion is a product of the likelihoods for all the individuals in the sample. 
The partial likelihood, on the other hand, is a product of likelihoods for 


all events that are observed to occur. Thus, 


[A8] 


where PL is the partial likelihood and K is the total number ofeventsin 


the sample. l 

To understand how the Lx’s are constructed, let us consider the 
hypothetical example in Table Al. Here we have a sample of 10 = 
but only five events are observed; the other five cases are censored. 
Three of the observations are censored at time 12, presumably are 
the study ended at that point. Observation 4 is censored at ume are 
Observation 6 is censored at time 9. In these two cases, censoring ae 
have occurred because of death, deliberate withdrawal from the study, 
°F inability to locate the individual in later follow-up men a F: 

For convenience, the observations are arranged in order accor ing * 
ti, the time of censoring or event occ e. The first event pen tain 
dividual | at time 2. At that time, à duals in the ey init 
at risk of an event occurring. We now à n that an event 0 


at ti a: i al | rather 
time 2, what is the probability that! 


urrenc 
11 10 indivi 
sk: Give V 
t occurred to individu 
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TABLE Al 
Example of Calculations for Partial Likelihood Estimation 
i ti k Ly 
2 1 cbX] J (ebxy + ebx2 +. + ebX10) 
4 2 ebX2 / (ebX2 + ebX3 + + ebX10) 
5 3 ebX3 / (ebx3 + ebx4 + + ebX10) 
5* 
6 4 ebX5 /(ebx5 + ebx6 +... + ebX10) 
gt 
th 5 ebX7 / (ebX7 + ebxg + ebxg + ebX10) 
12* 
12* 
12¥ 
individuals; tj = time of event occurrence or censoring for individual i: 
5. 


*Censored. 


than to one of the other 9 case 


s? This Probability is Lı. It may be 
expressed as 


h,(2) 


hyQ)+hQ)+ + hy (2) 


[A9] 


where, as before, hi(t) is the hazard for individual i at time t. Thus, We 
take the hazard at time 2 for the individual who experienced the event 
and divide by the sum of the hazards for all the individuals at risk at time 
2. While A9 has considerable intuitive appeal, a formal derivation '§ 
actually quite tedious (Tuma, 1982) and will not be given here. 

The expression for Li holds regardless of the model chosen for the 
dependence of the hazard on time and the explanatory variables. Under 
the proportional hazar 


ds model, however, it simplifies considerably- 
For that model 


hi(D = exp(a(t) + bx;) = exp(a(t)) exp(bx,) [alo] 
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where x; is a column vector of explanatory variables for individual i, and 
b is a row vector of coefficients. When this is substituted into the ` 
expression for Lı, the quantity exp(a(t)) cancels from each term, leaving 


exp(bx, ) ami 


L 
J exp(bx, ) + exp(bx,) + era texp(bx,5) 


It is this cancellation that makes it possible to estimate the coefficient 
vector b while completely disregarding the unspecified function a(t). 

L2 is constructed in the same way. Given that an event occurred at 
time 4, Lz is the probability that the event occurred to individual 2 rather 
than to one of the other individuals at risk at time 4. The only difference 
is that individual 1, having already experienced an event, is no longer at 
tisk at time 4. Therefore, 


exp(bx, ) [a12] 


L, exp(bx,) + exp(bx,) +... + exp(bX, 9) 


Formulas fo and Ls are given in Table Al. ; 

Note a ene doi! i ‘a does not depend on the exact time at 
Which the k event occurs. It could occur at any point after the e vA 
event and before the (k + 1)" event and still have the same magnitude, 
is only the order of the events that affects the partial ie aa 

Once the partial likelihood is constructed, it can be maximize i a 
Ordinary likelihood function using the Newton-Raphson algori 
(Kalbfleisch and Prentice, 1980; Lawless, 1982). 


APPENDIX B: 
PROGRAM LISTINGS FOR GLIM, 
SAS, AND BMDP EXAMPLES 


; e 
[A] Estimation of exponential regression model using GLIM3 (se 


Chapter 3). 


(1) SUNITS 432 


(2) SDATA ARST WEEK FIN AGE RACE WEXP MAR PARO 
PRIO AGEI EDUC WORK 

(3) SDINPUT 1 

(4) SCALC LWK = %LOG(WEEK) 

(5) SYVAR ARST 

(6) SERROR P 

(7) SLINK L 

(8) SOFFSET LWK 


(9) SFIT FIN, AGE, 
EDUC, WORK 


(10) SDISPLAY E 


RACE, WEXP, MAR, PARO, PRIO, AGE!, 


TRANSLATION 
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(1) Set sample size at 432. 


(2) Name the variables in the order in which they will be read- 
ARST is a dum 


(6) Specify a Poisson err 


(7) Specify a logarith 
conditional mea 
of the explanato 


or distribution, 


mic link, that is, 


n of the dependent vari 
TY variables. 


e the 
the logarithm of a 
able is a linear funct! 
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(8) Include LWK as an explanatory variable, but force its coeffi- 
cient to be I. 
(9) Estimate a model with the 10 named explanatory variables. 


(10) Display the coefficients and standard errors. 


The rationale for the peculiar model specified in lines 5-8 is given in 
Aitkin and Clayton (1980). If there are no censored data, the model can 
be fit more directly by specifying WEEK as the dependent variable, a 
logarithmic link, and an exponential error distribution. 


[B] Partial likelihood estimation using SAS procedure PHGLM (see 


Chapter 4). 


(I) PROC SORT DATA = LIFE.DATA; 
(2) BY DESCENDING WEEK; 

(3) PROC PHGLM; 

(4) EVENT ARST; 


(5) MODEL WEEK = FIN AGE RACE WEX 
PRIO AGE! EDUC WORK; 


P MAR PARO 


TRANSLATION 


week observed), in 


(1), {arrest (or last i 
), (2) Sort the data by week of arrest ( gern 


descending order. PHGLM requires tha 
sorted by the time of the event or censoring. 
(3) Invoke PHGLM. 
(4) Declare ARST as the censor 
if not). 


(5) Estimate a model with WE 
10 named explanatory vat! 


Ë [C] Partial likelihood with a time- 
MDP2L (see Chapter 4). 


ingindicator (1 if arrested, 0 


EEK as the event time and the 
ables. 


varying explanatory variable using 


3, CODE = BMPLIFE. 


(1) /INPUT UNIT = 
(2) /FORM TIME= WEEK. STATUS = ARST. RESPONSE 


= 1. 


14 


(3) 


/ REGRESSION 


COVARIATE = FIN,AGE,RACE,WEXP,MAR,PARO, 
PRIO,AGEI,EDUC. 


(4) ADD = WRK. 
(5) AUXILIARY = WORKI TO WORKS2. 


FORTRAN subroutine, which must be compiled and linked with main 
program: 


(6 


(7 
(8 
(9 


) SUBROUTINE P2LFUN (Z,ZT,AUX,TIME,NFXCOV, 
NADD,NAUX,ISUB,X) 


DIMENSION Z(9),ZT(1),AUX(52) 
ZT(1) = AUX(TIME) 
) RETURN 


Svs 


(10) END 


TRANSLATION 


(1) 
(2) 


(3) 
(4) 


(5) 


(6) 


(7) 


Read a BMDP file called BMPLIFE from unit 3. 
Declare WEEK as t 
indicator, and | 
data. 


Specify a model wit 
tory variables, 


he event time, ARST as the uve 
as the value of ARST indicating uncensore 


h the nine named time-constant explana- 


Include a time-varying explanatory variable named WR K, not 
yet defined. 


Declare 52 Variables to b 
These are 52 dummy va 
was employed 


€ used in defining the values of bie 
riables with a value of | if the perso 
in a particular week, otherwise 0. 

Subroutine st 


atement described in BMDP manual, Note 
inclusion of last variable X, which is not documented in 1981 
manual. This is a Missing value indicator, not used in this 
example. 


Construct arrays of nine constant explanatory variables, oe 
time-varying explanatory Variable, and 52 auxiliary variables. 


(8) 


(9) 
[D] 
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Define WRK at a particular TIME to have the value of the 
auxiliary variable measured at that TIME. Here TIME is 
equivalent to the variable WEEK. 


Return to the main program. 
Partial likelihood for nine types of transition using SAS 


PHGLM (see Chapter 8). 


(1) 


(2) 
(3) 


(4) 
(5) 
(6) 
(7) 
(8) 
(9) 
(10) 
(11) 
(12) 
(13) 


(14) 
(15) 
(16) 


(17) 
(18) 
(19) 


(20) 
(21) 
(22) 


DATA PHYS.JOB; 


SET PHYS.JOB; 
IF TYPE2 = | THEN EVI = 1; ELSE EVI = 0; 


IF TYPE2 = 2 THEN EV2= l; ELSE EV2 = 0; 
IF TYPE2 = 3 THEN EV3= l; ELSE EV3 = 0; 
PROC SORT; 

BY DESCENDING DURI; 

DATA; 

SET PHYS.JOB; 

IF TYPE! NE 1 THEN DELETE; 
PROC PHGLM; 

EVENT EVI; 

MODEL DURI = UND GRAD TIME P 
NJOBS YEAR CAGE CARTT; 

PROC PHGLM; 


EVENT EV2; i 
MODEL DURI = UND GRAD TIME PDOC US 


NJOBS YEAR CAGE CARTT; 
PROC PHGLM; 
EVENT EV3; 


MODEL DURI = UND 
NJOBS YEAR CAGE CARTT; 


DATA; 
SET PHYS.JOB; 
IF TYPEI NE 2 THEN DELETE: 


DOC US INBR 


NBR 


GRAD TIME PDOC US INBR 


(23) 
(24) 
(25) 


(26) 
(27) 
(28) 


(29) 
(30) 
G1) 


(32) 
(33) 
(34) 
(35) 
(36) 
(37) 


(38) 
(39) 
(40) 


(41) 
(42) 
(43) 


PROC PHGLM; 
EVENT EVI; 


MODEL DURI = UND GRAD TIME PDOC US INBR 
NJOBS YEAR CAGE; 


PROC PHGLM; 
EVENT EV2; 


MODEL DUR! = UND GRAD TIME PDOC US INBR 
NJOBS YEAR CAGE; 


PROC PHGLM; 
EVENT EV3; 


MODEL DURI = UND GRAD TIME PDOC US INBR 
NJOBS YEAR CAGE; 


DATA; 

SET PHYS.JOB; 

IF TYPE] NE 3 THEN DELETE; 
PROC PHGLM; 

EVENT EV]; 


MODEL DURI = UNDGRAD TIME PDOC US NJOBS 
YEAR CAGE; 


PROC PHGLM: 
EVENT EV2: 


MODEL DURI = UND GRAD TIME PDOC US NJOBS 
YEAR CAGE: 


PROC PHGLM: 
EVENT EV3; 


MODEL DURI = UND GRAD TIME PDOC US NJOBS 
YEAR CAGE: 


TRANSLATION 


(1-5) Create dummy 


í variables corresponding to the three types 
of destination. 


(6-7) Sort the data by duration of job. 


(8-10) 
(11-13) 
(14-16) 
(17-19) 
(20-22) 
(23-31) 
(32-34) 
(36-43) 
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Exclude all origin jobs other than rated academic. 

Fit a model for destinations to rated academic jobs. 
Fit a model for destinations to unrated academic jobs. 
Fit a model for destinations to nonacademic jobs. 
Exclude all origin jobs other than unrated academic. 
Fit models for the three types of destination. 

Exclude all origin jobs other than nonacademic. 


Fit models for the three types of destination. 


APPENDIX C: 
COMPUTER PROGRAMS 


Here is a brief description of several publicly available computer 
programs that will estimate one or more of the models discussed F 
It is current as of spring 1984. Table C1 gives a summary of the models 
that can be estimated and some of the features that are available. 


BMDP2L 


BMDP2L (Dixon, 1981) is a program in the BM DP statistical es 
age for estimating proportional hazards models. It has many uselu 
features and options. For further information, contact 


BMDP Statistical Software 
1964 Westwood Blvd., Suite 202 
Los Angeles, CA 90025 

(213) 475-5700 


CENSOR 


CENSOR (Meeker and Duke, 1981) is a FORTRAN program oe 
estimating parametric models. Although it was designed primarily w 
fitting various distributions to univariate data, it will also estimat 


‘ ; : r 
several regression models. It is currently available from the author fo 
$50. Contact 


William Q. Meeker, Jr. 
Department of Statistics 
Iowa State University 
Ames, IA 50001 

(515) 294-5336 


GLIM 


GLIM (Baker and Nelder, 1978) is a FORTRAN program for ane 
active fitting of “generalized linear models,” a family that includes 
ordinary regression, logit, probit, and log-linear models. If there are nO 
censored data, GLIM3 can fit the €xponential regression model directly- 


Using special Procedures (Aitkin and Clayton, 1980; Roger and peaz 
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S 1383), it can also fit the exponential, Weibull, and log-logistic 
na els to censored data. GLIM4, which was scheduled for release inthe 
ummer of 1984, is supposed to fit these models directly, as well as the 
proportional hazards model. Contact 


The GLIM Co-ordinator 
NAG Central Office 
Mayfield House 

256 Banbury Road 
Oxford 0X2 7DE 

UK 


PHGLM 


ee mS Aa 

ibrary of the SAS statistical package. 

Hon of the proportional hazards model. Beca! 

Get fully supported by SAS. PHGLM is one of the easi 

ce programs to use, but does not have as many features as 
DP2L or SURVREG. Contact 


art of the supplemental program 
It does partial likelihood estima- 
use it is user contributed, it 
est partial 


SAS Institute Inc. 
Box 8000 
Cary, NC 27511 


RATE 


RATE (Tuma, 1979) is a FORTRAN program t 


k i 4 
Sey renewal model of Chapter 7, and several vari 
ersion 3, recently released, is priced at approximately $I 


available from 


o estimate the Mar- 


ants of that model. 
75 and is 


ao Corporation 
p O. Box 881 

alo Alto, CA 94302 
(415) 856-4770 


SURVREG 


S os 
or URVREG (Preston and Cl 
estimating both parametric 


“ORTRAN program 


arkson, 1983) 1s a F 
hazards models. 


and proportional 
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TABLE Cl 
Models and Features Available in Six Computer Programs 


BMDP2L CENSOR GLIM3 PHGLM® RATE SURVREG 


Model 
Proportional 
hazards x X 
Weibull x x 
Gompertz 
Exponential 
Log-normal 
Log-logistic 


x mK 
mn KK 


Feature 
Stratification x 
Time-varying 
explanatory 
variable X 
Diagnostic plots x xX xX 


a. 1983 version. 
b. Version 2. 
c, Allows time-varying ex 


planatory variables for parametric models, but not for pro- 
portional hazards model. 


appears to be one of the most general and flexible programs available. 
Designed for interactive use, it can also be used in batch mode. It is 
currently available for $100 from the author: 


Douglas B. Clarkson 
Department of Mathematics 
University of Missouri 

St. Louis, MO 63121 


NOTES 


1. In some cases, i ; ; 
same Er ane a terpeoy aramas E 
measured more frequently or less fre £ tly th ji geroei a che ag a 
aeea aana ie AOT as y than the intervals used for constructing the 
dealing with this rka ( ) and Tuma (1982) discuss different ways of 
ead yn weed fe estimate models in which the coefficients of one or more 
Kranai 5r Ee to vary at each of the discrete time points. This is 
ARE Rada ae given explanatory variable by each of the four dummy 
Scphndedmudeh 2. These product terms are then added to Model 2 to create an 
een be] SE be noted that these results are very similar to the results 
Grdinsey ee ies cid Lenihan (1980) using the much more familiar procedure of 
are wine we fd oe h 2 Dummy dependent variable indicating whether or not an 
ieo tina 7 ah ; 5 in Shaper |, their procedure discards information on variability 
PT ER ER x or those who were arrested. That doesn’t seem to make much 
peat. Se es cree however, perhaps because only 27 percent of the cases were 
Sy a cae sii interval of observation was relatively short. : 
faneuaaeil ee a ie will be sufficient evidence to discriminate among alternative 
Ae tata eiay ene Na jemography, for example, it is well known that the Gompertz 
s a better description of human mortality at ages above 25 than does 


the Weibull distribution. 
BB niki Se poe it is possible to 
relate tothe a Š Jon ordinal data other than event 
o ARa a methods for ordinal data prop N 
an be explained as follows. Suppose we wish to estima! 


use the partial likelihood method to estimate 
history data. This approach is closely 
osed by McCullagh (1980). 

te the model 


log h(t) = a(t) + bix + bow() 


d for divorce at timet, 


azar 
and 


constant over time, 
is the wife's age at 


where t z x F i 
is the length of time since a marriage began, h(t) is the h 
ariable that is 


W(t) is the wife’s a il c 

alt) is tne wife 's age at time t, X issome explanatory V 

ma Fs arbitrary function. Notice that w(t) = t + w(0) where w(0) 
age, Substituting into the model above gives 


log h(t) = a(t) + bot + bix + b2w(0) 


This ma 
his may be written as 


log h(t) = a2(t) + bix + bow(0) 


Where a > s í 
com e a(t) = a(t) + bet, a different arbitrary function of time. Thus, the time-varying 
Ponent of the wife’s age is absorbed into the arbitrary function- 
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7. Most programs for partial likelihood estimation presume that all individ uals id 
the risk set at the same time, t= 0. What is needed is a program that will allow individua 
enter the risk set at different points on the time axis. A "A 

8. Right censoring means that the length of an interval is known to be greater t p z 
certain value, but the exact length is not known. Left censoring, on the other hand, pans 
that the length of an interval is known to be /ess than some value even though the a 
length is not known (Turnbull, 1974). By these definitions, the initial intervals for the na d 
of data under discussion are right censored rather than left censored. These intervals 


i ause initial 
cannot be treated by the usual methods for right-censored data, however, because mn 
” . > A r : ema Sza, 
intervals have a different distribution than later intervals (Flinn and Heckman, 198 
1982b), 


P . . r ` . . PE e | ot 
9. A complication in estimating this model is that some of the variables were n 
defined for some of the origin states, 


assigned values of 0 whenever they wi 
10. Note, however, that the time- 
time-varying explanatory variables. 


In the combined model, the “extra” variables were 
ere not defined for a particular origin job type ; 
ordering may be somewhat ambiguous in the case of 
In many cases, individuals are able to anticipate Ue 
occurrence of an event and alter their behavior Prior to the actual occurrence. An obvious 


r z 2 A : ior to 
example is the case in which women who are expecting a child drop out of school prior t 
the actual birth. 
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